Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Determining prescriptions in electronic healthcare record (EHR) data: methods for development of standardized, reproducible drug codelists

View ORCID ProfileEmily L Graul, View ORCID ProfilePhilip W Stone, View ORCID ProfileGeorgie M Massen, View ORCID ProfileSara Hatam, View ORCID ProfileAlexander Adamson, View ORCID ProfileSpiros Denaxas, View ORCID ProfileNicholas S Peters, View ORCID ProfileJennifer K Quint
doi: https://doi.org/10.1101/2023.04.14.23287661
Emily L Graul
1School of Public Health, Imperial College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Emily L Graul
  • For correspondence: e.graul{at}imperial.ac.uk
Philip W Stone
1School of Public Health, Imperial College London, London, UK
2National Heart & Lung Institute, Imperial College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Philip W Stone
Georgie M Massen
2National Heart & Lung Institute, Imperial College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Georgie M Massen
Sara Hatam
3Usher Institute, University of Edinburgh, Edinburgh, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sara Hatam
Alexander Adamson
1School of Public Health, Imperial College London, London, UK
2National Heart & Lung Institute, Imperial College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alexander Adamson
Spiros Denaxas
4University College London, Institute of Health Informatics, London, UK
5British Heart Foundation Data Science Centre, Health Data Research UK, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Spiros Denaxas
Nicholas S Peters
2National Heart & Lung Institute, Imperial College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nicholas S Peters
Jennifer K Quint
1School of Public Health, Imperial College London, London, UK
2National Heart & Lung Institute, Imperial College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jennifer K Quint
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

ABSTRACT

Objective Epidemiological research using electronic healthcare records(EHR) informing everyday patient care uses combinations of codes (“codelists”) to define diseases and prescriptions (or phenotypes). Yet methodology for codelist generation varies, manifesting in misclassification bias, while there are drug-specific codelist considerations.

Materials and Methods We developed methods to generate drug codelists, testing this using the Clinical Practice Research Datalink (CPRD) Aurum database, accounting for missing data in “attribute” search variables. We generated codelists for 1)cardiovascular disease and 2)inhaled Chronic Obstructive Pulmonary Disease (COPD) therapies, applying them to a sample cohort of 335,931 COPD patients. We compared searching on all search variables (A,”gold standard”) to B) chemical and C) ontological information only.

Results In Search A we determined 165,150 patients prescribed cardiovascular drugs(49.2% of cohort), and 317,963 prescribed COPD inhalers (94.7% of cohort). Considering output per value set, Search C missed substantial prescriptions, including vasodilator anti-hypertensives (A and B:19,696 prescriptions; C:1,145) and SAMA inhalers (A and B:35,310; C:564).

Discussion We recommend the full methods (A) for comprehensiveness. There are special considerations when generating adaptable and generalizable drug codelists, including fluctuating status, cohort-specific drug indications, underlying hierarchical ontology, and statistical analyses.

Conclusions Methods must have end-to-end clinical input, and be standardizable, reproducible, and understandable to all researchers across data contexts.

LAY ABSTRACT Health research using patient records informs everyday medicine, using groups of codes (“codelists”) to define diseases and drugs. Yet methods to create drug codelists are inconsistent, may not include physician expertise, nor be reported.

We developed a reproducible method to create drug codelists, testing it using de-identified healthcare records. We generated codelists for 1) heart conditions and 2) inhalers to identify prescriptions in a sample group of 335,931 patients with chronic lung disease. We compared our full methods (Search A) to two restricted searches to show prescriptions can be missed if necessary considerations are not made.

In search A, we determined 165,150 people (49.2% of sample group) prescribed drugs from the heart codelist. For lung inhalers, we determined 317,963 prescriptions (94.7% of group). Search C missed substantial prescriptions, for drugs lowering blood pressure by opening vessels (A and B:19,696 prescriptions; C: 1,145), and short-term inhalers opening airways (A and B: 35,310; C:564).

We recommend full methods(A) for completeness. Drug codelist methods must be consistent, duplicable, and include physician input at all research stages, and have special considerations including status (eg, new, taken off market), disease, and drug categorical system. Quality methods should be freely accessible and usable across study contexts.

Competing Interest Statement

JQ has received grants from MRC, HDR UK, GSK, BI, asthma+lung UK, and AZ and personal fees for advisory board participation, consultancy or speaking fees from GlaxoSmithKline, Evidera, AstraZeneca, Insmed. NP has received funding from Imperial Health Charity, SD is supported by the BHF Data Science Centre led by HDR UK (grant SP/19/3/34678), BigData@Heart Consortium, funded by the Innovative Medicines Initiative-2 Joint Undertaking under grant agreement 116074, the NIHR Biomedical Research Centre at University College London Hospital NHS Trust (UCLH BRC), a BHF Accelerator Award (AA/18/6/24223), E) the CVD-COVID-UK/COVID-IMPACT consortium and the Multimorbidity Mechanism and Therapeutic Research Collaborative (MMTRC, grant number MR/V033867/1). PS reports grants from asthma+lung UK and Gilead. EG, GM, AA, and SH have nothing to disclose.

Funding Statement

No funding is reported for this study. This research was supported by the NIHR Imperial Biomedical Research Centre (BRC).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Clinical Practice Research Datalink (CPRD) has NHS Health Research Authority (HRA) Research Ethics Committee (REC) approval to allow the collection and release of anonymised primary care data for observational research [NHS HRA REC reference number 05/MRE04/87]. Each year CPRD obtains Section 251 regulatory support through the HRA Confidentiality Advisory Group (CAG), to enable patient identifiers, without accompanying clinical data, to flow from CPRD contributing GP practices in England to NHS Digital, for the purposes of data linkage [CAG reference number 21/CAG/0008]. The CPRD Research Data Governance (RDG) committee gave ethical approval for this work (protocol number 22_002515) and the approved protocol is available upon request. Linked pseudonymised data was provided for this study by CPRD. Data is linked by NHS Digital, the statutory trusted third party for linking data, using identifiable data held only by NHS Digital. Select general practices consent to this process at a practice level with individual patients having the right to opt-out.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • Discussion updated with further description of current literature and tools. Figures 2, 3, and 4 added to enhance understanding of methodology. Tables/data underlying Figures 3 and 4 moved to supplemental file (supplemental files updated).

Data Availability

Data may be obtained from a third party and are not publicly available. Data are available on request from CPRD. CPRD data provision requires purchase of a license, and this license does not permit the authors to make them publicly available to all.

https://github.com/NHLI-Respiratory-Epi/drug-codelist-creation

https://github.com/NHLI-Respiratory-Epi/SNOMED-CT-codelists

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted June 23, 2023.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Determining prescriptions in electronic healthcare record (EHR) data: methods for development of standardized, reproducible drug codelists
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Determining prescriptions in electronic healthcare record (EHR) data: methods for development of standardized, reproducible drug codelists
Emily L Graul, Philip W Stone, Georgie M Massen, Sara Hatam, Alexander Adamson, Spiros Denaxas, Nicholas S Peters, Jennifer K Quint
medRxiv 2023.04.14.23287661; doi: https://doi.org/10.1101/2023.04.14.23287661
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Determining prescriptions in electronic healthcare record (EHR) data: methods for development of standardized, reproducible drug codelists
Emily L Graul, Philip W Stone, Georgie M Massen, Sara Hatam, Alexander Adamson, Spiros Denaxas, Nicholas S Peters, Jennifer K Quint
medRxiv 2023.04.14.23287661; doi: https://doi.org/10.1101/2023.04.14.23287661

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Epidemiology
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)