ABSTRACT
Objective Epidemiological research using electronic healthcare records(EHR) informing everyday patient care uses combinations of codes (“codelists”) to define diseases and prescriptions (or phenotypes). Yet methodology for codelist generation varies, manifesting in misclassification bias, while there are drug-specific codelist considerations.
Materials and Methods We developed methods to generate drug codelists, testing this using the Clinical Practice Research Datalink (CPRD) Aurum database, accounting for missing data in “attribute” search variables. We generated codelists for 1)cardiovascular disease and 2)inhaled Chronic Obstructive Pulmonary Disease (COPD) therapies, applying them to a sample cohort of 335,931 COPD patients. We compared searching on all search variables (A,”gold standard”) to B) chemical and C) ontological information only.
Results In Search A we determined 165,150 patients prescribed cardiovascular drugs(49.2% of cohort), and 317,963 prescribed COPD inhalers (94.7% of cohort). Considering output per value set, Search C missed substantial prescriptions, including vasodilator anti-hypertensives (A and B:19,696 prescriptions; C:1,145) and SAMA inhalers (A and B:35,310; C:564).
Discussion We recommend the full methods (A) for comprehensiveness. There are special considerations when generating adaptable and generalizable drug codelists, including fluctuating status, cohort-specific drug indications, underlying hierarchical ontology, and statistical analyses.
Conclusions Methods must have end-to-end clinical input, and be standardizable, reproducible, and understandable to all researchers across data contexts.
LAY ABSTRACT Health research using patient records informs everyday medicine, using groups of codes (“codelists”) to define diseases and drugs. Yet methods to create drug codelists are inconsistent, may not include physician expertise, nor be reported.
We developed a reproducible method to create drug codelists, testing it using de-identified healthcare records. We generated codelists for 1) heart conditions and 2) inhalers to identify prescriptions in a sample group of 335,931 patients with chronic lung disease. We compared our full methods (Search A) to two restricted searches to show prescriptions can be missed if necessary considerations are not made.
In search A, we determined 165,150 people (49.2% of sample group) prescribed drugs from the heart codelist. For lung inhalers, we determined 317,963 prescriptions (94.7% of group). Search C missed substantial prescriptions, for drugs lowering blood pressure by opening vessels (A and B:19,696 prescriptions; C: 1,145), and short-term inhalers opening airways (A and B: 35,310; C:564).
We recommend full methods(A) for completeness. Drug codelist methods must be consistent, duplicable, and include physician input at all research stages, and have special considerations including status (eg, new, taken off market), disease, and drug categorical system. Quality methods should be freely accessible and usable across study contexts.
Competing Interest Statement
JQ has received grants from MRC, HDR UK, GSK, BI, asthma+lung UK, and AZ and personal fees for advisory board participation, consultancy or speaking fees from GlaxoSmithKline, Evidera, AstraZeneca, Insmed. NP has received funding from Imperial Health Charity, SD is supported by the BHF Data Science Centre led by HDR UK (grant SP/19/3/34678), BigData@Heart Consortium, funded by the Innovative Medicines Initiative-2 Joint Undertaking under grant agreement 116074, the NIHR Biomedical Research Centre at University College London Hospital NHS Trust (UCLH BRC), a BHF Accelerator Award (AA/18/6/24223), E) the CVD-COVID-UK/COVID-IMPACT consortium and the Multimorbidity Mechanism and Therapeutic Research Collaborative (MMTRC, grant number MR/V033867/1). PS reports grants from asthma+lung UK and Gilead. EG, GM, AA, and SH have nothing to disclose.
Funding Statement
No funding is reported for this study. This research was supported by the NIHR Imperial Biomedical Research Centre (BRC).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Clinical Practice Research Datalink (CPRD) has NHS Health Research Authority (HRA) Research Ethics Committee (REC) approval to allow the collection and release of anonymised primary care data for observational research [NHS HRA REC reference number 05/MRE04/87]. Each year CPRD obtains Section 251 regulatory support through the HRA Confidentiality Advisory Group (CAG), to enable patient identifiers, without accompanying clinical data, to flow from CPRD contributing GP practices in England to NHS Digital, for the purposes of data linkage [CAG reference number 21/CAG/0008]. The CPRD Research Data Governance (RDG) committee gave ethical approval for this work (protocol number 22_002515) and the approved protocol is available upon request. Linked pseudonymised data was provided for this study by CPRD. Data is linked by NHS Digital, the statutory trusted third party for linking data, using identifiable data held only by NHS Digital. Select general practices consent to this process at a practice level with individual patients having the right to opt-out.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Discussion updated with further description of current literature and tools. Figures 2, 3, and 4 added to enhance understanding of methodology. Tables/data underlying Figures 3 and 4 moved to supplemental file (supplemental files updated).
Data Availability
Data may be obtained from a third party and are not publicly available. Data are available on request from CPRD. CPRD data provision requires purchase of a license, and this license does not permit the authors to make them publicly available to all.
https://github.com/NHLI-Respiratory-Epi/drug-codelist-creation