Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Identifying multi-resolution clusters of diseases in ten million patients with multimorbidity in primary care in England

View ORCID ProfileThomas Beaney, View ORCID ProfileJonathan Clarke, View ORCID ProfileDavid Salman, Thomas Woodcock, Azeem Majeed, Paul Aylin, View ORCID ProfileMauricio Barahona
doi: https://doi.org/10.1101/2023.06.30.23292080
Thomas Beaney
1Department of Primary Care and Public Health, Imperial College London, London, W6 8RP, United Kingdom
2Centre for Mathematics of Precision Healthcare, Department of Mathematics, Imperial College London, London, SW7 2AZ, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Thomas Beaney
  • For correspondence: thomas.beaney{at}imperial.ac.uk
Jonathan Clarke
2Centre for Mathematics of Precision Healthcare, Department of Mathematics, Imperial College London, London, SW7 2AZ, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jonathan Clarke
David Salman
1Department of Primary Care and Public Health, Imperial College London, London, W6 8RP, United Kingdom
3MSk Lab, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for David Salman
Thomas Woodcock
1Department of Primary Care and Public Health, Imperial College London, London, W6 8RP, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Azeem Majeed
1Department of Primary Care and Public Health, Imperial College London, London, W6 8RP, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Paul Aylin
1Department of Primary Care and Public Health, Imperial College London, London, W6 8RP, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mauricio Barahona
2Centre for Mathematics of Precision Healthcare, Department of Mathematics, Imperial College London, London, SW7 2AZ, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mauricio Barahona
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Identifying clusters of co-occurring diseases can aid understanding of shared aetiology, management of co-morbidities, and the discovery of new disease associations. Here, we use data from a population of over ten million people with multimorbidity registered to primary care in England to identify disease clusters through a two-stage process. First, we extract data-driven representations of 212 diseases from patient records employing i) co-occurrence-based methods and ii) sequence-based natural language processing methods. Second, we apply multiscale graph-based clustering to identify clusters based on disease similarity at multiple resolutions, which outperforms k-means and hierarchical clustering in explaining known disease associations. We find that diseases display an almost-hierarchical structure across resolutions from closely to more loosely similar co-occurrence patterns and identify interpretable clusters corresponding to both established and novel patterns. Our method provides a tool for clustering diseases at different levels of resolution from co-occurrence patterns in high-dimensional electronic healthcare record data.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This research is funded through a clinical PhD fellowship awarded to TB from the Wellcome Trust 4i programme at Imperial College London. We are grateful for the support of the NIHR Imperial Biomedical Research Centre. JC acknowledges support from the Wellcome Trust (215938/Z/19/Z). DS is supported by an Imperial College and National Institute of Health Research (NIHR) Post-Doctoral, Post-CCT research fellowship. TW, AM and PA acknowledge support from the National Institute for Health and Care Research (NIHR) Applied Research Collaboration Northwest London. MB acknowledges support from EPSRC grant EP/N014529/1 supporting the EPSRC Centre for Mathematics of Precision Healthcare.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Data access to the Clinical Practice Research Datalink (CPRD) and ethical approval was granted by the CPRD Research Data Governance Process on 28th April 2022 (Protocol reference: 22_001818).

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

This study uses patient level which is not publicly available but can be requested for users meeting certain requirements: https://cprd.com/research-applications. The code lists and embeddings generated from this work are available to download from: https://tbeaney.github.io/MMclustering/.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted June 30, 2023.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Identifying multi-resolution clusters of diseases in ten million patients with multimorbidity in primary care in England
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Identifying multi-resolution clusters of diseases in ten million patients with multimorbidity in primary care in England
Thomas Beaney, Jonathan Clarke, David Salman, Thomas Woodcock, Azeem Majeed, Paul Aylin, Mauricio Barahona
medRxiv 2023.06.30.23292080; doi: https://doi.org/10.1101/2023.06.30.23292080
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Identifying multi-resolution clusters of diseases in ten million patients with multimorbidity in primary care in England
Thomas Beaney, Jonathan Clarke, David Salman, Thomas Woodcock, Azeem Majeed, Paul Aylin, Mauricio Barahona
medRxiv 2023.06.30.23292080; doi: https://doi.org/10.1101/2023.06.30.23292080

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Epidemiology
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)