Abstract
Samples can be prone to ascertainment and attrition biases.The Australian Genetics of Depression Study is a large publicly recruited cohort (n=20,689) established to increase the understanding of depression and antidepressant treatment response. As part of the recruitment, participants donated a saliva sample and were given the option to consent to linkage of prescription records for research purposes. This study investigates differences between participants who donated a saliva sample or agreed to linkage of their records compared to those who did not. We observed that older, male participants with a higher education were more likely to donate a saliva sample. Self-reported bipolar disorder, ADHD, panic disorder, PTSD, substance use disorder and social anxiety disorder were associated with lower odds of donating a saliva sample whereas anorexia was associated with higher odds of donation. Male and younger participants showed higher odds of agreeing to record linkage. Participants with higher neuroticism scores and those with a history of bipolar disorder were also more likely to agree to record linkage whereas participants with a diagnosis of anorexia were less likely to agree. Increased likelihood of consent was also associated with increased genetic susceptibility to anorexia and reduced genetic risk for depression, and schizophrenia whereas there was no significant genetic effect for neuroticism. Overall, our results show moderate differences among these subsamples. Most current epidemiological studies do not adjust, nor search, for attrition biases at the genetic level. The possibility to do so is a strength of samples such as the AGDS. Our results suggest that analyses can be made more robust by identifying attrition biases both on the phenotypic and genetic level, and either contextualising them as a potential limitation or performing sensitivity analyses adjusting for them.
Introduction
The Australian Genetics of Depression Study (AGDS) is a large cohort study including more than 20,000 participants. Recruitment targeted individuals who had been diagnosed or treated for depression [1]. The majority (75%) of the participants in the study are women. The mean age (at the time of recruitment) was 43 years with a standard deviation of 15 years. A high percentage of the participants (95%) have been diagnosed with depression, and similar to what is seen in population studies, 68% of the cohort reported a history of at least one other comorbid mental health diagnosis. The AGDS collected a vast amount of phenotypic data through online questionnaires, as well as a biological specimen for genotyping. A key component of the study was the optional consent to linkage of participants’ prescription history with the study data. These datasets enable a variety of novel analyses such as corroboration of medication self-reports as well as health economics and comorbidity analyses.
Public recruitment practices can be prone to biases as they indirectly target a portion of the population that consists of individuals that are willing to participate in the study and that decision could lead to an over or under representation (relative to the population) of certain traits [2], and even specific disorders [3] that correlate with willingness to participate. A similar phenomenon may occur among participants in their completion of optional modules or participation in subsequent waves of data collection. We refer to this phenomenon as attrition. Previous studies have shown that attrition can influence measures of association [4] by creating a bias in the prevalence and incidence of the variables studied. An example of this is collider bias, an artificial association between two variables as the product of either adjusting for a covariate that is actually an outcome of the two variables studied [5], or sampling strategies which modify the likelihood of recruiting participants with specific values for the common outcome [6].
Large cohorts and genetic studies are not free of sampling and attrition biases. Evidence of such include the observation of unexpected autosomal heritability of sex in cohorts with active recruitment, which could possibly reflect differences in genetic factors driving participation in males and females [7], and genetic factors associated with participating in optional subsections of surveys of the UK-Biobank [8,9], all of which maylead to incorrect inferences in downstream analyses. Identifying and quantifying selection biases in cohorts and subsamples is necessary to identify potentially spurious findings, or to account for them statistically where possible. The objective of this study was to evaluate selection biases in two aspects of participation in the ADGS: donating a saliva sample for genotyping and agreeing to linkage to medical prescription records (PBS linkage) in the AGDS. We did this by investigating sociodemographic and psychiatryc differences between participants who donated a saliva sample compared to those who did not. We further tested whether there is evidence for heritability to agree to PBS linkage and investigated sociodemographic, clinical and polygenic differences between participants who agreed to PBS linkage and those who did not. Finally, we compare our results across the two participation measures and with studies of participation in other cohorts such as the UK-Biobank.
Methods
Sample recruitment
The Australian Genetics of Depression Study recruited 22,424 Australian participants through two avenues: a mail-out to patients who had at least four prescriptions for antidepressants in the previous 5 years (14%) and media campaign to recruit patients who had received a diagnosis of depression from a doctor, psychiatrist or psychologist (86%). Potential participants were directed to the AGDS website (https://www.geneticsofdepression.org.au), and informed consent was gathered prior to data collection through online questionnaires. The AGDS inclusion criteria included (i) reporting that they had been treated for depression by a health professional (ii) agreeing to donate a saliva sample for genotyping (although only 72.5% actually did so). Full details for the AGDS can be found elsewhere [1].
Among the phenotypes collected, participants confirmed whether they had taken any of the ten most commonly prescribed antidepressants in Australia. For each antidepressant taken, we gathered data on antidepressant efficacy and experienced side effects. Data on demographics, clinical history of psychiatric disorders as well as personality traits (neuroticism and extraversion scores) were also collected. Furthermore, willing participants provided optional consent to record linkage of their Pharmaceutical Benefits Scheme (PBS) prescription and Medical Benefits Scheme (MBS) records, which could be used to validate medication self-reported data, assess concurrent medications and infer comorbidities.
In this study, we focus on 1) whether participants agreed to record linkage of their PBS records for research purposes as the outcome of interest, and 2) whether they donate a saliva sample for DNA genotyping. The full list and details of instruments used for AGDS phenotyping are available at https://bit.ly/3y72lyg. All participants provided informed consent prior to participating in the study. The QIMR Berghofer Medical Research Institute Human Research Ethics Committee approved all questionnaires and research procedures for the AGDS under project number P2118.
Genotyping imputation and quality control
Upon completion of the core questionnaire, participants were mailed a GeneFix GFX-02 2mL saliva DNA extraction kit (Isohelix plc) to use at home and then returned by mail for subsequent genotyping. The AGDS sample was genotyped using the Illumina Global Screening Array (GSA V.2.0). Genotype data were cleaned by removing unknown or ambiguous map position, strand alignment, high missingness (>5%), deviation from Hardy-Weinberg equilibrium, low minor allele frequency (<1%), and GenTrain score <0.6 variants. Imputation was performed through the Michigan imputation server web service using the HRCr1.1 reference panel, as the majority of the cohort were of European ancestry. Genotyped individuals were excluded from polygenic risk score (PRS) analyses based on high genotype missingness, inconsistent and unresolvable sex, or if deemed ancestry outliers from the European population, based on principal components derived from the 1000Genomes reference panel (defined as >6SD from the PC1/PC2 centroid).
SNP-based heritability
We employed genome-based restricted maximum likelihood (GREML) as implemented in GCTAv.1.91.7 [10] to estimate the proportion of variance in consent to record linkage (on the observed scale) explained by measured genetic differences (SNP-based heritability). This approach leverages a genetic-relatedness matrix and restricted maximum likelihood to partition the variance of a phenotype into a genetic and environmental component [11]. The underlying logic is to assess whether genetic covariation explains a significant proportion of the covariation of a trait. For this study, a GRM based on a subset of unrelated (genomic relatedness cutoff < 0.05) individuals of European ancestry was employed to identify evidence for a genetic component to consenting record linkage of PBS records.
Polygenic risk scores (PRS)
We computed PRS in order to test to what extent the genetic risk for traits that showed a phenotypic association, or have previously been linked to attrition in other studies (see discussion), predicted consent to PBS record linkage. For the PRS predictions, we excluded European ancestry outliers and used only one member from groups of related individuals (genomic relatedness cutoff < 0.05) to avoid confounding from cryptic relatedness, as this violates the independence assumption of a classical logistic regression. We estimated PRS for educational attainment [12], neuroticism [13], major depressive disorder [14], bipolar disorder [15], schizophrenia (SCZ) [16], and anorexia nervosa [17] using GWAS summary statistics without sample overlap with the AGDS cohort. SBayesR was used to estimate the joint GWAS effect sizes adjusting for the correlation between SNPs [18]. Prior to estimating PRS, we excluded low imputation quality (r2 < 0.6), MAF < 0.01, non-autosomal and strand-ambiguous variants. Imputed genotype dosage data were used to calculate PRS by multiplying the variant effect size by the dosage of the effect allele. Finally, the total sum was calculated across all variants. This procedure was performed using Plink 1.9 [19]. A relevant follow up question is how much of the SNP-based heritability is explained jointly by these genetic factors. To estimate the proportion of SNP-based heritability explained by these PRS, a secondary GREML analysis including all nominally significant PRS (SZC, MDD and anorexia) as a fixed effect was performed, and the SNP based heritability of both models (with and without PRS) were compared using the formula where VG0 is the genetic variance component estimated in the null model (i.e., without PRS) and VGadj is the same estimate but of the model including PRS for SCZ, MDD and anorexia as fixed effects. This approach is likely to yield an underestimate as the GRM used to estimate SNP-based heritability is constructed using SNPs also included in the PRS.
Statistical analyses
Logistic regression was used to examine the association between consent to record linkage (yes/no) and variables of interest including age, sex, educational attainment, psychopathology, and PRS. The non-genetic regressions were adjusted for sex and age at study enrollment. Regressions using genetic variables included age, sex and the first 20 genetic principal components to adjust for potential population stratification. Variance explained was estimated by the regression coefficient of determination squared (R2). Results assessing the proportion of SNP-based heritability used linear mixed effects models fit via restricted maximum likelihood. Nominally significant results are defined as those with p<0.05, and statistical significance was defined after Bonferroni correction for multiple testing.
Results
Demographic factors and samples description
Table 1 shows demographic information across AGDS participants who did or did not consent to record linkage. We also contrast demographic factors between participants who donated a saliva sample for genotyping and those who did not. Participants agreeing to record linkage consent were, on average, older (OR=1.016; 95%C.I.=[1.01-1.02] per year of age). Female participants were less likely to provide consent for record linkage (OR=0.69; 95%C.I.=[0.63-0.74]). Overall the cohort’s educational attainment is high (e.g. ∼25% of participants reported having a postgraduate degree). We did not observe a significant association between educational attainment and providing consent for record linkage (OR=1.02; 95%C.I.=0.99-1.05). Being male (OR=1.13; 95%C.I.=[1.05-1.23]), older (OR=1.012; 95%C.I.=[1.010-1.015] per year of age) and having a higher educational attainment (OR=1.18; 95%C.I.=[1.14-1.21] per educational category) were all associated with donating a saliva sample for genotyping.
Associations with psychiatric and personality traits
The association between self-reported lifetime psychiatric diagnosis and differential consent for record linkage or biological sample are shown in Table 2. Participants with self-reported diagnosis of bipolar disorder (OR=1.29 95%C.I[1.16-1.44]) or higher neuroticism score (OR=1.02 95%C.I.=[1.01-1.04]) were more likely to consent to record linkage. In contrast, participants who reported a diagnosis of anorexia (OR=0.68 95%C.I.=[ 0.60 - 0.77]) were less likely to consent to record linkage. There was nominal association between consent to record linkage and participants reporting a lifetime diagnosis of personality disorder (OR= 1.19 95% C.I.=[1.05 - 1.36], substance use disorder (OR= 1.26 95% C.I.=[1.06 - 1.49] or anxiety disorder (OR=1.07 95% C.I.=[1.01 - 1.14]). For the outcome of donating a saliva sample, the following were associated with lower odds of donation, bipolar disorder (OR= 0.83 95% C.I.=[0.75 - 0.92]), ADHD (OR= 0.73 95% C.I.=[0.63 - 0.85]), panic disorder (OR= 0.75 95% C.I.=[0.68 - 0.83]), PTSD (OR= 0.81 95% C.I.=[0.74 - 0.88]), substance use disorder (OR= 0.78 95% C.I.=[0.66 - 0.9]) and social anxiety disorder (OR= 0.93 95% C.I.=[0.88 - 0.99]). Conversely, participants reporting a lifetime diagnosis of anorexia showed higher odds (OR=1.84 95%C.I.=[01.81-2.15]) of providing a saliva sample.
Genetic factors
This section focuses on consenting to record linkage amongst those who provided a DNA sample. Necessarily, genetic analyses comparing participants who did or did not provide a biological specimen are not possible. Amongst those who provided a biological sample, 79% consented to PBS, implying a binomial variance of 0.79 (1- 0.79) = 0.17 on the observed scale. The GREML analysis suggested the presence of a genetic contribution to the likelihood of consenting to record linkage. The SNP-based heritability on the observed scale was 0.12 (S.E.= 0.03, p=1.1e-7, phenotypic variance on the observed scale∼0.16). We hypothesised that the genetic risk (operationalised as PRS) for the psychiatric traits identified above (e.g., bipolar disorder, anorexia or neuroticism) would be associated with differential consent for record linkage.
To assess specific genetic factors, we used logistic regressions to test for association between PRS for neuroticism, MDD, SCZ or EA. PRS were validated by first predicting the specific trait of interest. All PRS were predictive of their respective traits (Table 3). For example, neuroticism PRS was strongly associated with neuroticism score. Anorexia PRS (OR= 0.93 95% C.I.=[0.89 - 0.97]) and SCZ PRS (OR= 0.92 95% C.I.=[0.88 - 0.97]) were associated with lower odds of consenting to linkage. Conversely, MDD PRS was associated with higher odds of consent to record linkage (OR= 1.06 95% C.I.=[1.02 - 1.10]; Table 3). Using a GREML analysis, we estimated that MDD, SCZ and anorexia PRS accounted for around 12% of the SNP-based heritability of consent to record linkage (see Methods) which would account for around 1.4% of the total phenotypic variance on the observed scale. Notably all PRS were still significantly associated with agreeing to record linkage when jointly estimating their effects using GREML (MDDPRS beta=0.012 S.E.=3.7e-3; SCZPRS beta=-0.013 S.E.=4.9e-3; AnorexiaPRS=-9.2e-3 S.E.=3.8e-3).
Discussion
This study investigated systematic differences between AGDS participants who consented to record linkage of prescription history and those who did not. We also investigated sociodemographic differences associated with the decision to donate a saliva sample for DNA analysis. Our study is motivated by the need to acknowledge and identify biases that could limit the generalisability of findings from prescription history data, and potentially lead to spurious associations.
Educational attainment has been reported to be a relevant factor for voluntary participation in medical research [9]. In fact, the AGDS has been shown to be a highly educated sample compared to the Australian population [1]. Here, we identified a moderate association between educational attainment and donating a biological sample, but the relationship between educational level and consent to record linkage did not reach statistical significance.
A recent study analysed genetic and demographic factors related to whether UK-Biobank participants shared their email for a follow up questionnaire. Women and individuals at high genetic risk for schizophrenia were less likely to share their email [8]. While we also identified that females were less likely to consent to record linkage, we found no association between consent to record linkage and a self-reporting of schizophrenia diagnosis. Nonetheless, we found a significant inverse association between SCZ PRS and consent to record linkage, which would align with the findings in the UK-Biobank and suggest our lack of phenotypic association to be due to lack of power from the low number of participants reporting schizophrenia or due to potential unreliability of a self-reported measure. Increased genetic risk for schizophrenia is likely related to higher basal suspiciousness, a heritable trait related to psychiatric disorders [20] which likely leads to a lower desire to record linkage.
Similar research has proposed that participants with higher neuroticism score and depression are less likely to participate in follow-up questionnaires. [9]. Participants with these characteristics may be more likely to experience feelings of anger, anxiety and irritability [21] when prompted with follow-up research surveys. In our study, individuals with higher neuroticism scores were more likely to consent to record linkage of their prescription history. We also report an association between neuroticism PRS and neuroticism score. However, the neuroticism PRS was not statistically associated with consent for record linkage. A similar result (i.e. phenotypic association, valid PRS but no association between PRS and consent to record linkage) for Bipolar disorder was observed. This observation could be explained by the fact that PRS does not capture all of the heritability for a trait and thus suffer from reduced power, but it may also imply that the relationship between neuroticism and consent to record linkage is mediated solely through the environmental component of neuroticism.
Participants reporting a lifetime diagnosis of bipolar disorder were more likely to consent to record linkage of their prescription history and less likely to provide a saliva sample for genotyping. Furthermore, genetic risk for depression was nominally associated with greater odds of consenting for record linkage, which is the reverse direction to the previous reports [9]. Participants who reported anorexia were less likely to consent to record linkage, but more likely to provide a biological specimen. This finding may be related to personality characteristics linked to anorexia, such as behaviors in relation to losing control [22]; consenting to access to the clinical prescription history could be violating the concept of having control [23]. However, this explanation fails to explain the positive association with donating a biological specimen.
Overall, the lack of consistency between demographic and clinical factors associated with consent for record linkage and donating a saliva sample, coupled with the results of other studies of participation and attrition, leads us to hypothesise that there is no single set of factors underlying attrition to different designs an settings of follow up studies. Evidence of differential factors in our study include the fact that previously reported negative predictors of participation, such as neuroticism and genetic risk for depression [9], were positively associated with consent to record linkage in our study; that educational attainment was associated with donating a saliva sample, but not with consent to record linkage and that a lifetime diagnosis of anorexia was associated with both record linkage and saliva donation but with opposing effects.
It is important to highlight some limitations of this study. The AGDS sample is ascertained for people who have a lifetime history of depression, the majority of whom have been prescribed antidepressants. These participants seem to be highly motivated to participate in the research and are highly enriched for help-seeking behaviours, making it a sensitised subset of the population. The diagnoses studied relied on participant self-report, which could lower the reliability of these diagnoses; however, as mental health disorders are highly impactful, a participants’ likelihood to accurately report the diagnoses communicated to them is high. Additionally, participants were informed of the time-period that would be covered by the record linkage and it is possible that participants who had not experienced a depressive episode in this time period may have been less likely to consent for record linkage (the previous 4.5 years from recruitment). It is also important to note that only Australian citizens and permanent residents have access to the PBS system and we did not collect information on legal citizenship or residentship status. Military personnel and their families do not use the MBS/PBS system. As such, non-residents and military personnel (and their families) would not be able to consent to PBS record linkage. Finally, within the genetic analyses, we excluded participants of non-European ancestry to avoid population stratification. For this reason, care should be taken when generalising these findings to populations with other ancestral backgrounds.
Overall our results suggest participation biases to be specific to the design and nature of the study. Our results show that not all observed differences are recapitulated through polygenic differences. These observations imply that using genetics is not a perfect tool to measure and adjust for unobserved biases. Conversely, subtle genetic differences may not be statistically identified using non-genetic data. For example, an association between SCZ PRS and consent to record linkage was not obvious from testing for association with a diagnosis for schizophrenia. The power to detect these differences using PRS should increase as discovery GWAS sample sizes increase. Thus, PRS will likely be a useful tool against attrition biases. Longitudinal studies with genotype data, such as the AGDS, enable us to identify these biases. Future studies leveraging the prescription data should consider these differences, and perform sensitivity analyses assessing whether their findings could be attributed to genetic risk for schizophrenia, depression or anorexia. Most current epidemiological studies do not adjust nor search for attrition biases at the genetic level. The possibility to do so is a strength of samples such as the AGDS. An example of such an approach is a recent study on depression identifying an unexpected positive genetic correlation between depression and cognitive traits was identified. Follow up analyses suggested these results to be explained by the fact that the AGDS sample is also highly educated whereas the education levels of the controls were more concordant with the Australian population [24]. We argue future analyses can be made more robust by identifying biases both on the phenotypic and genetic level, and either contextualising them as a potential limitation or performing sensitivity analyses adjusting for them.
Data Availability
Summary data on effects described in this manuscript is available in the results section. GWAS summary statistics used in this study are publicly available and have been disclosed in the methods but are also available upon request. Code related to this study data analysis is available upon request from the authors. Access to the AGDS individual level data is restricted due to the ethical guidelines governing the study.
Competing interests
IBH has been: Commissioner of Australia’s National Mental Health Commission (2012–2018); Co-director of Health & Policy at the Brain & Mind Centre, University of Sydney; leading community-based and pharmaceutical industry-supported projects (Wyeth, Eli Lilly, Servier, Pfizer, AstraZeneca) focused on the identification and better management of anxiety and depression; a member of the Medical Advisory Panel for Medibank Private until October 2017; a board member of Psychosis Australia Trust; a member of the Veterans Mental Health Clinical Reference Group; and Chief Scientific Advisor to and an equity shareholder in Innowell. LG, NGM, PAL, SEM,PL, LCC,, ML, LMGM, NRW, LY, EMB, MER and AC have nothing to disclose.
Author Contributions
AIC and NGM designed the study. LG, SDT, LMGM and CY performed the analyses with supervision and guidance from AIC. EMB, PAL, NRW, SEM, IBH, and NGM implemented and supervised the data collection for the AGDS. LCC, LY, and MKL critically appraised the manuscript and gave input in the design and implementation of the analyses. LG, AIC and MER contributed to the first draft of the manuscript and all authors contributed to editing and drafting the manuscript prior to submission.
Acknowledgements
Data collection for AGDS was possible thanks to funding from the Australian National Health & Medical Research Council (NHMRC) to NGM, NRW, SEM, IHB, EMB, PAL (GNT1086683) and Medical Research Future Fund (APP1200644). We thank our colleagues Richard Parker, Simone Cross, Scott Gordon and Lenore Sullivan for their valuable work coordinating all the administrative and operational aspects of the AGDS project. NRW thanks the support of NHMRC through grants 1113400 and 1173790. MER thanks the support of NHMRC and the Australian Research Council (ARC) through an NHMRC-ARC Dementia Research Development Fellowship (GNT1102821). SEM is supported in part by NHMRC investigator grant APP1172917. The views expressed are those of the authors and not necessarily those of the affiliated or funding institutions.