ABSTRACT
Context Real-life data consist of exhaustive and unbiased data to study drug-safety profiles but are underused because of their complex temporality (i.e., safety depends on the dose, timing, and duration of treatment) and the considerable number of potential side effects to study. We aimed to create a pipeline that manages the complex temporality of real-life data using a data-driven strategy (i.e., without any hypothesis on the potential side effects to search for) to highlight the safety profile of a given drug. We used hydroxychloroquine (HCQ) and its co-prescription in a real-life database to illustrate this pipeline.
Methods We incorporated a weighted cumulative exposure statistical model into a data-driven strategy. This pipeline makes it possible to highlight both long-term and short-term side effects, while avoiding false positives due to the natural course of the underlying disease. We applied the proposed pipeline to a cohort of 2,010 patients with a prescription of HCQ and used their drug prescription as the source of data to highlight the HCQ safety profile.
Results The proposed pipeline introduces a bootstrap strategy into weighted cumulative-exposure statistics estimates to highlight significant drug signals. As applied to HCQ, the proposed pipeline showed nine drugs to be significantly associated with HCQ exposure. Of note, one of them has therapeutic indications for known HCQ side effects. Other associations could be explained by therapeutic indications linked to conditions associated with HCQ indications in France.
Conclusion We propose a data-driven pipeline that makes it possible to provide a broad picture of the side effects of a given drug. It would be informative to pursue the development of this pipeline using other sources of data.
1. INTRODUCTION
Adverse drug reactions (ADRs) have been attributed to causing over 770,000 injuries and 100,000 deaths and $76.6 billion in annual costs in the US (1). The rapid detection of ADRs has become a crucial public health issue. Currently, drug safety is monitored by the pharmacovigilance system, which uses spontaneous reporting systems (SRSs) to detect, collect, and analyze ADRs. However, SRSs suffer from underreporting. Indeed, less than 10% of serious ADRs are reported (2,3). Moreover, this system is subject to biases due to selective reporting (most of the reported cases are considered as suspected ADRs).
The increasing availability of electronic healthcare records (EHRs) offers major opportunities to investigate a wide spectrum of ADRs and detect drug safety signals closer to real use and time, as EHR databases record information for large populations over long follow-up periods (4). These databases, such as electronic medical records and administrative claims databases, have been mostly used to confirm or disprove potential signals flagged by SRSs. A number of data-mining techniques have been specifically developed for the automatic detection of drug-safety signals using either SRS or EHR databases (5–11). Over the last decade, several international initiatives have been developed; the Mini-Sentinel and OMOP (Observational Medical Outcomes Partnership) in the United States and the PROTECT (Pharmaco-epidemiological Research on Outcomes of Therapeutics by a European Consortium) and EU-ADR (Exploring and Understanding Adverse Drug Reactions) in Europe.
The challenge of drug-safety signal-detection methods is to handle four types of difficulties. The first difficulty is the data source. The study of long-term adverse drug reactions or effects not suspected by healthcare professionals requires the use of a real-life data source, such as EHR or claims databases, which do not suffer from the known bias of underreporting and reporting selection (12,13). Many drug-safety methods use diagnostic codes from EHR or claims databases to highlight a signal. In most countries, including France, these diagnostic codes are only reported when patients visit hospitals and are therefore strongly biased towards severe side effects. In addition, the diagnostic codes in these databases are primarily reported for financial purposes (reimbursement) and may subsequently be biased towards diagnoses that can be priced. On the contrary, drug reimbursements provide a less biased data source, although they are less precise because a given drug may correspond to several indications and multiple drugs may be prescribed for a given condition. The second difficulty is the consideration of a broad spectrum of potential side effects, and not only candidate side effects (14), to be able to highlight new signals. The third difficulty is to precisely take into account the temporal aspect.
Time is important, because the type of adverse reactions caused by the medication under study may differ according to the duration of the medication prescriptions. Certain adverse effects may occur soon after the start of the medication under study, whereas others may require a prolonged period of administration to become manifest (15). Safety depends on the dose, timing, and duration of the treatment. The last difficulty concerns distinguishing true side effects from the natural course of the disease. Indeed, the natural course of the disease for which the prescribed drug is indicated may be associated with many other diagnoses, which may be indistinguishable from drug side effects using common drug-safety methods.
In this study, we developed a data-driven pipeline based on a cumulative exposure test to highlight the broad picture of side effects of a given drug. We used the French national medication reimbursement database, and given the Covid19-pandemic context, applied this pipeline to HCQ.
2. Description of the WCE-AI pipeline
The WCE-AI pipeline aims to overcome the four aforementioned difficulties as follows:
- Data source: We used all drug prescriptions as the unique source of side-effect signals. Drugs are represented at the 5th level of the ATC to group all medications at the active substance level that can be prescribed for the same indication.
- Temporality: We modeled the increase in risk related to cumulative dose using splines in Cox proportional hazards models (16,17). This model is called weighted cumulative exposure (WCE). It allows the representation of complex cumulative effects of dose, timing, and duration of interest drug (18).
- Signal sensitivity: We use bootstrapping to infer the distribution of association statistics between the exposure and each ATC class. This provides insight into the broad spectrum of association signals and highlights those that are significant.
- Specificity of side effects: We introduced a covariate representing disease severity built from expert knowledge to automatically pinpoint drugs given for the condition for which the exposure drug has been prescribed.
We first introduce the WCE-AI pipeline, followed by a presentation of a comparative method to our WCE-AI pipeline. Finally, we present a use case of this pipeline and a comparative method applied to HCQ.
2.1 WCE-AI pipeline
The WCE-AI pipeline is divided into three steps: data extraction, data preparation, and safety-profile extraction (Figure 1).
i. Data extraction
All drug prescriptions are extracted, along with their treatment initiation date, end of treatment date, ATC class, and patient characteristics, such as age, sex, and disease severity. Disease severity is defined from expert knowledge on the drugs being indicated for the disease for which the exposure is described.
ii. Data preparation
We built a data frame corresponding to the WCE package (19) data format for each level 5 ATC class. In each data frame, each row corresponds to a given time period, such as day, week, or month (to be defined). Id identifies the patients. Start and Stop identify the beginning and end of each interval (Stop in row n = Start in row n+1). The intervals are closed on the right. For each patient, the first start (start = 0) is the date of the first drug prescription in the study period. Event is a binary indicator for the event of interest, which has a value of 1 if the event occurred in the interval specified by Start and Stop. For a given subject, event = 1 can only occur in the last interval of his or her follow-up. For each data frame, the event corresponds to the prescription of an level 5ATC class. The last four columns represent three fixed-in-time patient covariates (sex, baseline age, and severity), as well as exposure to the drug of interest. The severity covariate is set to 1 if the patient has been reimbursed at least once for the drug being indicated for the same condition for which it was prescribed (to be defined).
iii. Safety-profile extraction
The weighted cumulative exposure (WCE) model is applied to estimate the cumulative effect of duration, dose, and prescription date for the drug of interest (18). The WCE approach estimates the effect of past exposure using a weighted sum of all previous instances of exposure, with weights depending on the time elapsed since exposure and the dose (17,20). This makes it possible to account for the fact that the type of adverse reactions caused by the medication under study may depend on the duration of the medication prescription (certain adverse effects may occur soon after the start of the medication under study, whereas others may require a prolonged period of administration to become manifest). WCE is applied with adjustments for age, sex, and disease severity. For example, to measure the risk of obtaining a prescription of paracetamol/acetaminophen (event) after HCQ exposure (drug of interest), we search for the first prescription of paracetamol/acetaminophen for each patient in the cohort (event). Then, we define potential exposure to HCQ using the WCE method for the six months (time window) before the first paracetamol/acetaminophen prescription (Figure 2).
Finally, a Cox proportional hazards model (HR) is fitted to compare a group exposed to the drug of interest (HCQ in this example) during the time window to a group not exposed during the same period.
To show that a drug is significantly associated with exposure, confidence intervals for each hazard ratio are needed. To estimate such confidence intervals, we chose to use the bootstrap method to extract both the confidence intervals and P-values of the HR and the age/sex/severity coefficient (number of repeats to be defined). The bootstrap method is applied for each unique patient identifier. We constructed bootstrap replicates of the data, each of which was randomly sampled with a replacement. Confidence intervals were estimated using the percentile method:
where
denotes the jth quantile (lower limit) and
denotes the kth quantile (upper limit)
For example, a 95-percentile bootstrap CI with 1,000 bootstrap samples is the interval between the 25th quantile value and the 975th quantile value of the 1,000 bootstrap parameter estimates.
All analyses were performed using R (version 4.0.3) and WCE package (version 1.0.2).
2.2 WCE competitor
We compared WCE to self-control case-crossover (SCCO) (21,22), as it relies on the same type of data. SCCO requires to define risk periods and a wash-out period. The case-defining event for each level 5 ATC class was a prescription of a level 5 ATC class other than HCQ. Four periods were defined for each individual with a prescription for the same level 5 ATC class separated by a washout period: one risk period and three control periods. Each period had the same duration (Figure 3). We performed a sensitivity study by varying the length of the periods (risk and control) between 3, 6, and 9 months, with a washout period of one month. The Bonferroni method was used to adjust the P-values to control the type I error rate in multi-testing.
Schematic representation of a case–crossover analysis. The case-defining event is the prescription of a level 5 ATC class other than HCQ and the prescription is the HCQ drug. Three control periods are selected (shown as CTR) and one risk period (shown as RISK) for each individual. The duration of the periods (risk and control) is set to three months in this example. The washout period (shown as WO) has a standard duration of one month.
2.3. Use case
In the present study, we applied the WCE-AI pipeline to patients newly exposed to HCQ (considered in WCE as the exposure variable). The cohort of patients receiving HCQ was extracted from the EGB, a permanent 1/97 representative sample of the Système National d’Informations Inter-régimes de l’Assurance Maladie (SNIIRAM), which includes the data for 66 million people. The EGB includes data for approximately 780,000 people (23), consisting of de-identified data on demographic characteristics (sex, year of birth, date of death), long-term diseases (ALD), and reimbursed acts (visits, medical procedures, laboratory tests, dispensed drugs, medical devices) (23–25). We extracted the data of all patients with at least one reimbursement for HCQ. We only included patients who did not receive HCQ during the previous year to select only newly exposed patients (Figure 4). For these patients, we extracted the following data: age, sex, date of all drugs deliverances, date of “chronic disease certification status”, which is a marker of disease severity in the French health system.
All the parameters that were used in our study and that can be modified for other studies are presented in Table 1.
The EGB database contains only anonymized data and its access is legally authorized without having to receive authorization from the national data protection agency (CNIL). The study protocol was submitted to the appropriate INSERM and CNAMTS entities, as legally required. In addition, the study was approved by our Institutional Review Board (CER-APHP CENTRE, IRB n°20180603).
RESULTS
The HCQ cohort contains 2,010 patients (nwomen = 1,577, 78%), with 386 different ATC classes. The average age at the time of the first prescription of HCQ was 54.8 (sd = 16.2). Within the cohort, 1,045 (52%) patients had certification (received full reimbursement) for a chronic disease. Among them, 12% (n = 240) each had vasculitis, systemic lupus erythematosus, or systemic scleroderma, 11% (n = 214) rheumatoid arthritis, and 6% (n = 122) each malignant neoplasm or malignant disease of the lymphatic or hematopoietic tissue. The average duration of follow-up was > 10 years (Table 2). However, there was an association between age and HCQ exposure (P < 0.001 by linear regression analysis), as well as between sex and HCQ (P < 0.001 by the Wilcoxon test). There was also a strong association between HCQ exposure and severity (P < 0.001 by the Wilcoxon test).
The WCE-AI pipeline enabled the identification of nine ATC classes associated with the prescription of HCQ, of which five were borderline significant. The highest risk ratios were obtained for hydrocortisone (HR = 3.96 [1.66-7.55]), alendronic acid and cholecalciferol (HR = 3.24 [1.22-7.36]), 2.73 [1.03-6.13], valsartan (HR = 2.73 [1.03-6.13]), and chlormadinone (HR = 2.65 [1.16-4.76]) (Figure 5 & Appendix 1).
The SCCO competitor method identified eight (81 before Bonferroni adjustment of p-values) ATC classes associated with HCQ prescription with control and risk periods of three months, 40 (162 before Bonferroni adjustment of p-values) for those of six months, and 67 (210 before Bonferroni adjustment of p-values) for those of nine months. The top 10 odds ratios per period are presented in Appendix 2.
Top 10 ATC classes associated with hydroxychloroquine prescription in the SCCO model after Bonferroni adjustment of p values
In total, 0, 2, and 3 ATC classes were common between the WCE-AI and SCCO methods for the periods of 3, 6, and 9 months, respectively: anethole trithione and tropicamide (common to WCE-AI, SCCO 6 months and SCCO 9 months) and tixocortol (common to WCE-AI and SCCO 9 months).
DISCUSSION
We report a new data-driven pipeline applied to HCQ on a cohort of 2,010 patients. This pipeline allowed us to identify nine ATC classes linked to HCQ, with the highest being alendronic acid and cholecalciferol, which are prescribed for osteoporosis.
Among the significant results we obtained, tropicamide, which is given in eye examinations for diagnostic purposes, is particularly relevant. Indeed, it is known that HCQ causes retinopathies and this ocular condition is monitored annually in people under such treatment (26). In addition, we identified a relevant signal for metopimazine, an antiemetic, with nausea being a frequently reported HCQ side-effect (27).
We found three drugs with a significant HR, as well as a significant severity coefficient: anethole trithione, a bile secretion-stimulating drug that restores salivation and relieves the discomfort of dry mouth in Sjögren’s syndrome, often associated with lupus (28); hydrocortisone, a glucocorticoid often used for withdrawal from more potent corticosteroids in the treatment of lupus or rheumatoid arthritis; and chlormadinone, a progestin macro-pill recommended as a contraceptive method for women with lupus (29,30). These three drugs, with a significant severity coefficient, are related to the management of the disease treated by HCQ and should not be considered as potential side effects.
Flu vaccine is also a co-prescription of HCQ. Indeed, influenza vaccination is a recommendation for patients with autoimmune disease who receive potentially immunosuppressive treatments (31).
Alendronic acid and cholecalciferol are also a co-prescription of HCQ for optimal management of patients with lupus or rheumatoid arthritis, mostly post-menopausal women, by internists or rheumatologists. In the literature, lupus and menopause are found to be two factors that favor osteoporosis (32,33). We believe that alendronic acid and cholecalciferol are not directly associated with HCQ but rather with the underlying disease.
Tixocortol is a glucocorticoid used as a nasal spray to treat allergic rhinitis. In the literature, there is a link between autoimmune diseases, such as lupus and allergic rhinitis. This link can be explained by the mechanism of these two diseases, which involves immune dysregulation and an increase in inflammatory mediators (34–38).
We did not find any clinically satisfactory explanation for the association with valsartan, an angiotensin II receptor blocker. It is know that HCQ can cause short-term conduction disorders (bundle or atrioventricular block) and, less frequently, long-term morphological abnormalities in heart tissue (myocardial hypertrophy) (39). Valsartan is given for high blood pressure and heart failure. It is impossible to know for which indication valsartan was prescribed, but because the signal is weak (borderline significant) and was not reproduced with the SCCO method, it may be a false positive signal due to multiple testing.
This pipeline did not detect any drugs that could reflect an adverse dermatological effect, even though it has been described in the literature (40).
The results of the SCCO method are difficult to interpret. This method is unable to separate ATC classes that are related to the course of the underlying disease (lupus or rheumatoid arthritis) from those that may represent a side effect.
Study strengths and limitations
A major strength of this study was the use of a population-wide claims database based on the national public healthcare system, making the database population representative of the entire country population (41). The EGB is a particularly suitable basis for post-marketing authorization safety studies to assess the occurrence of rare adverse events (24,25,42–44).
Our data-driven pipeline is based on a self-controlled method, thus avoiding the usual biases of methods such as case-control (45) or propensity score (46) based approaches. In addition, we used exact temporal patterns to detect the broad spectrum of potential side effects of drugs and not only candidate side effects based on a priori hypotheses. Our method, with the exact temporal pattern of drug exposure, can identify short- and long-term side effects (16) and provides the possibility to adjust on covariates.
Our study also had several limitations. Claims databases are primarily used for healthcare reimbursement purposes and we assume that the reimbursed drugs are consumed by the patient. These databases are extensive and can pose problems in terms of the statistical significance of the results and their interpretation. We only used drugs reimbursed outside the hospital. It may be informative to supplement this data source with hospital prescriptions. Another limitation is that our method only considers the co-prescription of drugs as the source of adverse drug reaction signals. Indeed, certain adverse drug effects do not require the prescription of another drug. For example, prolongation of the QT interval after exposure to HCQ cannot be detected by our method because the treatment for prolongation of the QT interval is the discontinuation of HCQ. Similarly, our method will not be able to detect fatal side effects or ototoxicity of HCQ (47).
In conclusion, we present a data-driven pipeline that provides a broad picture of the side effects of a given drug. Applied to HCQ, our pipeline allowed us to show that the adverse effects of HCQ are essentially known and controlled by health professionals.
This pipeline could not highlight all the known side effects of HCQ, such as cardiac toxicity, because it relies on a single source of data, drug prescriptions. It would be useful to pursue the development of this pipeline by bringing in other sources of data.
Data Availability
Data not available
AUTHOR CONTRIBUTIONS
P.S., M.W., J.P., N.D., and AS.J. wrote the manuscript. P.S., N.D., and AS.J. designed the research. P.S. performed the research. P.S., M.W., J.P., N.D., and AS.J. analyzed the data.
ACKNOWLEDGEMENTS
The authors gratefully acknowledge the contribution of Marie Jamin and Brigitte Sabatier for their valued comments on this manuscript.
Footnotes
CONFLICT OF INTEREST All authors declare that they have no competing interests for this work.
FUNDING No funding was received for this work.