Abstract
Objectives To evaluate the benefit of combining polygenic risk scores (PRS) with the QCancer-10 (colorectal cancer) non-genetic risk prediction model to identify those at highest risk of colorectal cancer (CRC).
Design Population based cohort study. Six different PRS for CRC were developed (using LDpred2 PRS software, clumping and thresholding approaches, and genome-wide significant models). The top-performing genome-wide and GWAS-significant PRS were then combined with QCancer-10 and performance compared to QCancer-10 alone. Case-control (logistic regression) and time-to-event (Cox proportional hazards) analyses were used to evaluate risk model performance in men and women.
Setting and participants UK Biobank Study. A total of 434587 individuals with complete genetic and QCancer-10 predictor data were included in the QCancer-10+PRS modelling cohorts.
Main outcome measures Prediction of colorectal cancer diagnosis by genetic, non-genetic and combined risk models.
Findings PRS derived using the LDpred2 program performed best, with an odds-ratio per standard deviation of 1.58, and top age- and sex-adjusted C-statistic of 0.733 (95% confidence interval 0.710 to 0.753) in logistic regression models in the validation cohort. Integrated QCancer-10+PRS models out-performed QCancer-10 alone. In men, the integrated LDpred2 (QCancer-10+LDP) model produced a C-statistic of 0.730 (0.720 to 0.741) and explained variation of 28.1% (26.3% to 30.0%), compared with 0.693 (0.682 to 0.704) and 21.0% (18.9% to 23.1%) for QCancer-10 alone. Performance improvements in women were similar. In the top 20% of individuals at highest absolute risk, the sensitivity of QCancer-10+LDP models for predicting CRC diagnosis within 5 years was 47.6% in men and 42.5% in women, with respective 3.49-fold and 2.75-fold absolute increases in the top 5% of risk compared to average. Decision curve analysis showed that adding PRS to QCancer-10 improved net-benefit and interventions avoided, across most probability thresholds.
Conclusions Integrating PRS with QCancer-10 significantly improves risk prediction over QCancer-10 alone. Evaluation of risk stratified population screening using this approach is warranted.
What is already known on this topic
Risk stratification based on genetic or environmental risk factors could improve cancer screening outcomes
No previously published study has examined integrated models combining genome-wide PRS and non-genetic risk factors beyond age
QCancer-10 (colorectal cancer) is the top-performing non-genetic risk prediction model for CRC
What this study adds
Adding PRS to the QCancer-10 (colorectal cancer) risk prediction model improves performance and clinical benefit, with greatest gain from the LDpred2 genome-wide PRS, to a level that suggests utility in stratifying CRC screening and prevention
Introduction
Colorectal cancer (CRC) is the fourth most common cancer in the United Kingdom (UK), with increasing incidence in younger ages and countries with historically lower rates. 1 Population screening is effective in reducing CRC incidence and mortality, through detection and removal of pre-malignant adenomas, and earlier detection of cancers. Screening modalities vary internationally. While colonoscopy is the gold-standard, it is expensive, invasive and time consuming. Many countries have adopted a staged process, with initial faecal blood testing, followed by colonoscopy for those who test positive. Risk-stratified approaches to screening direct resources to those at highest risk, have the potential to improve screening detection rates, reduce investigative burden of those at lower risk, and potentially improve cost-effectiveness. 2 Improved understanding of cancer risk could also improve informed consent and shared decision making around screening participation.
Both genetic and non-genetic factors contribute to an individual’s risk of CRC, some of the latter being modifiable. The top-performing non-genetic risk model in external validation is QCancer-10 (colorectal cancer), 3,4 which has been recommended as a tool to guide shared decision making around CRC screening. 5 QCancer-10 is a 15-year CRC prediction model, developed using the QResearch linked primary care database of almost 5 million individuals aged 25-84, registered at QResearch practices across England between 1998 and 2013. 4 It is based on age, ethnicity, family history, alcohol and smoking status, a small number of medical conditions, and for men, Townsend deprivation score and body mass index (BMI). As the predictors are derived from electronic health records, it could be embedded at point of care, and linked with screening records to facilitate risk stratification within the bowel screening programme.
Genetic variants known to predispose to CRC are mostly single nucleotide polymorphisms (SNPs) identified as significant in genome-wide association studies (GWAS). Genetic risk can be summarised in a polygenic risk score (PRS). Most existing PRS have used a limited set (typically tens) of risk SNPs that have achieved formal statistical significance in GWAS, with genotypes weighted by predicted effect sizes. 6 More recently, “genome-wide PRS” have incorporated many more SNPs than those reaching GWAS-significance, based on the notion that many true risk SNPs remain unidentified. These models have generally produced better performance than “GWAS-significant” models, but evaluation in CRC has been limited. 7,8 A further issue is that several previous evaluations of CRC PRS in the UK Biobank Study (UKB) are based on summary statistics derived from a GWAS meta-analyses which included UKB. 8,9 This overlap results in overfitting of models, and optimism in performance estimates. 10
Integrated models for CRC, which to date have combined GWAS-significant PRS with non-genetic risk factors, generally perform better than non-genetic models or PRS alone. 6,9 We hypothesised that integrating PRS with QCancer-10 would provide enhanced risk prediction, and that genome-wide PRS would give greatest benefit. We used UKB to develop and compare PRS using several genome-wide and GWAS-significant approaches, minimising overfitting and optimism by using summary data which did not overlap with the UKB dataset. We validated PRS performance in Geographic and Minority Ethnic Validation Cohorts. We then derived integrated QCancer-10+PRS risk models, using the top-performing genome-wide PRS and the GWAS-significant PRS, which we internally validated and compared with QCancer-10.
Methods
Overview
We conducted a development and validation study of PRS and integrated PRS-epidemiological models, to predict risk of CRC in a set of UK individuals of bowel cancer screening age. We followed the PRS-RS and TRIPOD reporting guidelines for PRS and prediction modelling. 11,12
We used UKB to derive and validate our risk models, under application number 8508. 13 In brief, just over 500,000 participants aged 40 to 69 (5.5% of invitees) were recruited to UKB from the general population across the UK between 2006 and 2010. 14 Baseline demographics, medical, lifestyle and physical data, and blood samples, were collected at recruitment. Follow-up through linked hospital, general practice and registry data is ongoing. A detailed description of genetic resources including quality control measures can be found in Bycroft et al. 13 and Supplementary Methods. In summary, participants were genotyped on one of two arrays (49950 individuals on the Applied Biosystems UK BiLEVE Axiom Array and the remainder on the Applied Biosystems UK Biobank Axiom Array) which share over 95% content. Following quality control, phasing was carried out using SHAPEIT3 with 1000 Genomes Phase 3 as a reference panel, followed by imputation using IMPUTE4 with the Haplotype Reference Consortium (HRC) dataset as the main reference panel, and secondarily with merged UK10K and 1000 Genomes phase 3 reference panels, and the datasets combined. SNP annotation was based on the GRCh37 assembly of the human genome.
Outcomes
The primary outcome in all models was CRC diagnosis, identified through self-report at UKB enrolment visit and ICD-9 (153, 154.0, 154.1) and ICD-10 (C18-C20) codes in linked cancer and death registry and hospital data. For PRS development and evaluation in logistic regression models, we included incident and prevalent cases, with the remaining cohort used as controls. For survival analysis using Cox proportional hazards (Cox) models, prevalent cases with a diagnosis prior to cohort entry were excluded. Follow-up began at date of enrolment, and was censored at the earliest of date of incident CRC, loss to follow-up, death, or end of available registry follow-up (31 st October 2015 for Scottish participants, 13 th March 2016 for all other participants).
We calculated age-specific and directly standardised CRC incidence rates in UKB overall and for the Integrated Modelling Cohort, and compared these with Office for National Statistics 2013 cancer registry data for England (chosen as the approximate mid-point of available UKB follow-up). 15 Age-specific rates were calculated in 5 year age bands between 40 and 80 as the number of first incident CRCs over the number of person years at risk. Age-standardised incidence rates were calculated using the 2013 European Standard Population aged 40-80. Rates are presented per 100000 person years at risk (see Supplementary Methods).
Polygenic risk scores
We meta-analysed summary data from 14 CRC GWAS cohorts (which did not include UKB) to provide SNP association effect sizes (see Supplementary Methods and Ref.16). There were 26397 cases and 41481 controls, all of European ancestry based on principal components analysis (PCA). The meta-analysis was performed using the META package (v1.7), 17 including SNPs imputed with an imputation quality (INFO) score > 0.8 from each dataset, using the fixed-effects inverse-variance method.
Three broad approaches to PRS development were evaluated (see Supplementary Methods). Firstly, we used a ‘standard’ PRS (hereafter ‘GWAS-sig’), which comprised a manually curated list of 50 sentinel SNPs shown in recent European GWAS-meta-analyses 16,18 to be independently and reproducibly associated with CRC risk at P<5×10 -8 in our meta-analysis. This PRS was constructed as a log-additive sum of SNP dosages weighted by their betas. Betas were adjusted for winner’s curse using FIQT correction. 19
Secondly, genome-wide clumping and thresholding methodologies were evaluated using ‘standard’ (C+T) and ‘stacked’ (SCT) approaches. 20
Thirdly, we used LDpred2, 21 which takes a Bayesian approach to SNP selection, accounting for linkage disequilibrium between the SNPs. We used three different LDpred2 options – an infinitesimal model (LDpred2-Inf), a non-sparse grid model (LDpred2-grid) and a sparse grid model (LDpred2-grid-sp).
Figure 1 shows the per-person quality control (QC) measures for the genetic data and sample exclusions for each modelling cohort. We used imputed dosage data from UK Biobank, and restricted SNPs to those included in the HapMap3 reference dataset, and with matched SNPs in the base data. Following QC, 1104409 SNPs were available for PRS development (Supplementary Methods).
UK Biobank participant flow diagram. Panel A shows quality control and derivation of PRS modelling cohorts. Panel B shows participant selection for the Integrated Modelling Cohorts. *More than one exclusion may apply per person.
Optimal PRS tuning parameters for genome-wide approaches were selected in the Training Cohort (n=30000, 446 cases; Supplementary Methods, Table S4). For each optimal PRS, we assessed association with CRC risk in logistic regression and Cox risk models in the Test cohort (n=280664; 4230 cases), adjusting for age, sex, genotyping array and the first four principal components (PCs) from UKB. We tested for interactions between age and PRS. Training and Test Cohorts included participants of white-British ancestry (identified through self-reported ethnicity and genetic information) 13 from England and Wales (Figure 1). Case prevalence was 1.5% in both cohorts. We compared performance to a reference model containing age, sex, genotyping array and four PCs, without the PRS. We also evaluated performance without age and sex in the model.
We reported the distribution of standardised PRS and adjusted odd ratios and hazard ratios per-standard deviation (Supplementary Methods). We used the C-statistic (Harrell’s C-index for Cox models) and Somers’ Dxy statistic to assess discrimination, in addition to Royston’s D statistic and separation of Kaplan-Meier curves across four risk groups (cut at 16th, 50th and 84th centiles, approximating to the mean and 1 standard deviation) 22 for Cox models. Nagelkerke’s R2 was used in logistic regression models and Royston and Sauerbrei’s in Cox models to assess variance explained, and R2 attributable to the PRS was calculated by R2 (full model) R2-(reference model). These measures were evaluated over the follow-up time of the cohort for Cox models. Scaled Brier scores (derived from the Brier score scaled to the maximum possible score for a given dataset, where a higher % score indicates better performance), 23 were used to assess overall model performance, calculated at 8 years of follow-up for Cox models. Each model was internally validated. Confidence intervals and internal validation used 500 bootstrap samples.
Prior to external validation, models were adjusted for optimism. The optimism-adjusted calibration slope was used as a global shrinkage factor to adjust the regression coefficients, and the intercept or baseline survival function was re-estimated (by refitting the model with the adjusted linear predictor as an offset). 24 Adjusted PRS models were then applied to a Geographic Validation Cohort, comprising Scottish participants with European ancestry, and a Minority Ethnic Validation Cohort (from any region). The null hypothesis of no difference in performance statistics between models was tested using paired t-tests with Bonferroni correction for multiple comparisons. In addition to the performance metrics described above, calibration was assessed through the calibration slope and visual assessment of calibration plots, with calibration-in-the-large for logistic regression models. Pre-specified subgroup analyses were performed in the Geographic Validation Cohort by sex, in those with a first degree family history of CRC, and by age (Supplementary Methods). We evaluated potential improvements in calibration in validation datasets obtained through recalibration-in-the-large (in which the intercept or baseline survival function is re-estimated in the new dataset).
Development of QCancer-10+PRS combined models
Coding of QCancer-10 predictors in UK Biobank were matched as closely as possible to the original model. 4 Ethnicity, previous medical history, alcohol and smoking status, and family history were all obtained from self-reported data in baseline touch-screen responses and verbal interviews at UKB assessment centres. Mapping of QCancer-10 predictors to UK Biobank data and coding of predictors is described in Supplementary Methods and Tables S1 and S2.
The Integrated Modelling Cohort used for QCancer-10 validation and integrated model development comprised all individuals with imputed genetic data passing QC, excluding the 30000 individuals used for PRS hyper-parameter selection (Supplementary Methods), and with complete QCancer-10 predictor data (Figure 1). Since missingness was <5% for all predictors (Table S3), we used complete case analysis. Sample size adequacy for integrated model development was calculated following Riley et al. 25 (Supplementary Methods).
We validated QCancer-10 performance in UKB, and recalibrated the model for the UKB dataset through recalibration-in-the-large. Full QCancer-10 model specification is available at https://www.qcancer.org/15yr/colorectal/. We then developed integrated models including the risk score from QCancer-10 plus either the top-performing genome-wide PRS (based on the maximum C-statistic and R2 in external validation) or the GWAS-significant PRS, using Cox models, developed in men and women separately. Inspection of Schoenfeld residuals showed that the proportional hazard assumption held. We evaluated the use of multiple fractional polynomials to model the predictors, ultimately using a fractional polynomial term to model the genome-wide PRS in the model for females. We assessed for possible interactions between the predictors by visual inspection of plots of marginal effects of the QCancer-10 risk score across PRS values, and examining the prognostic strength and significance of interaction terms based on Wald χ2statistics.
We used the same metrics to assess the original QCancer-10 model and QCancer-10+PRS model performance as described for Cox PRS models, with paired t-tests to compare models as above. Confidence intervals and internal validation used 500 bootstrap samples. Pre-specified subgroup analyses for QCancer-10 and QCancer-10+PRS included those with a first degree family history of CRC, individuals from minority ethnic backgrounds, and calibration by age.
Model sensitivities were evaluated by calculating the proportion of cases identified at centile thresholds for absolute risk and relative risk. Relative risks were calculated relative to an individual of the same age and sex, mean PRS (by sex), mean PCs, BMI of 25, white ethnicity, mean Townsend score, and no other CRC risk factors. We used decision curve analyses to compare the net benefit and interventions avoided using QCancer-10 and QCancer-10+PRS models. For decision curve and subgroup analyses, QCancer-10+PRS models were first adjusted for optimism, and recalibrated QCancer-10 models were used.
Statistical analysis was performed using R/3.6.2. 26
Patient and Public Involvement
Patients and public were not involved in the design, conduct, reporting or dissemination plans of this study.
Results
Demographics for the UKB-derived Integrated Modelling Cohort are shown in Table 1. Table S3 gives these values, including numbers not reported, for the whole UKB cohort; characteristics of each PRS cohort are shown in Table S4. Age-standardised CRC incidence from linked cancer-registry data in the whole UKB cohort was 108.3 and 73.9 cases per 100000 person years at risk for men and women respectively, compared to 127.8 and 80.7 respectively in ONS data. 15 Incidence in the Integrated Modelling Cohort, with cases identified through all linked data, was 118.0 CRCs per 100,000 years follow-up in men and 79.3 in women. Age-specific incidence rates in UKB (Figure S3) closely followed those from ONS until the age of 70, after which UKB rates were lower.
Demographic data and medical conditions included in QCancer-10 models, in male and female Integrated Modelling Cohorts, and in cases. Values are numbers (%) unless otherwise indicated. CRC – colorectal cancer, IQR – interquartile range, NA – not applicable. *not included in model for females but provided for information.
Polygenic risk score models
Of the 6 PRS models assessed (Figure S4), LDpred2-grid had the highest ORs per SD of PRS (1.584, 95% confidence interval 1.536 to 1.633; Table 2) and performed best in the test cohort with C-statistic 0.717 (0.711 to 0.725) and R2 6.3% (5.9 to 6.8%) (Table 2). A weak interaction between age and PRS was noted with a reduced effect size of PRS with increasing age (Table S5, Figure S5), but was not included in the models. Genome-wide models performed better than the GWAS-sig model, and all PRS showed improved performance over the reference model of age, sex, genotyping array and four PCs (Table 2). Performance without adjustment for age and sex is shown in Table S6. Internal validation showed low bias in all measures (Table 2).
Apparent, internally and externally validated polygenic risk score (PRS) performance in logistic regression models (adjusting for age, sex, genotyping array and first 4 principal components). Values are performance indices plus 95% confidence intervals. Internal validation used 500 bootstrap samples. LDpred2-inf – LDpred2 infinitesimal model; LDpred2-grid – LDpred2 grid model; LDpred2-grid-sp – LDpred2 sparse grid model; SCT – stacked clumping and thresholding; C+T – clumping and thresholding; GWAS-sig – GWAS significant; PRS OR per SD – odds ratio per standard deviation of polygenic risk score in the age- and sex-adjusted model; C – C statistic; Dxy – Somers’ Dxy rank correlation; R2 – Nagelkerke’s R2(explained variation); Slope – Calibration Slope; CITL – calibration-in-the-large. * R2 for all models in the Minority Ethnic Validation Cohort <0 (indicating poorer performance than a model with no explanatory variables). Pairwise comparisons of performance metrics in validation cohorts were all significantly different P<0.001 except comparisons marked #P=0.002, ##P=0.005, ^P=0.01, *P=1.
In the Geographic Validation Cohort, discrimination and variation explained improved compared to the Test Cohort for all models. LDpred2-derived models performed best, and all genome-wide models showed improved performance over the GWAS-sig model (Table 2). All models under-predicted risk (CITL >0, Table 2) particularly in the highest PRS groups (Figure S6), and genome-wide models were slightly over-fitted (calibration slope >1, i.e. insufficient variation at the extremes of prediction, Table 2, Figure S6).
In subgroup analyses of logistic regression models (Table S7, Figure S7), discrimination and explained variation were better in males; models were better fitted in females but under-predicted risk to a greater extent, particularly in higher risk groups. Discrimination and variation explained were poorer in individuals with a first-degree family history of CRC, with models systematically underpredicting risk across PRS risk groups. All models tended to under-predict risk across age groups (Figure S8).
Performance was poor in the Minority Ethnic Validation Cohort (Table 2). Models systematically under-predicted risk and were highly over-fitted (i.e. predictions were too extreme, Table 2), with modest improvement following recalibration (Figure S6).
In general, PRS performance in Cox models supported the logistic regression analysis (Tables S8-9, Figures S9-14).
QCancer-10 non-genetic model
QCancer-10 models risk in males and females separately. Comparative demographics of the original QCancer-10 derivation cohort and the Integrated Modelling Cohort are shown in Table S10. Notably the Integrated Modelling Cohort is older, less ethnically diverse, has a lower Townsend score, fewer smokers, and higher reported family history of CRC, than the QCancer-10 cohort. Model performance was in line with previously published studies (Table 3). 3 As expected, the model for females performed less well than the model for males. 3 Both models tended to over-predict risk, which was corrected through recalibration, though in women the model continued to over-predict in the top risk decile (Figure S15). In subgroup analysis, models were well calibrated across age groups; they underpredicted risk in individuals from minority ethnic backgrounds; and the model for females tended to over-predict risk in those with a first-degree family history of CRC, particularly in higher risk groups (Table S11, Figures S16-S17).
Apparent and internally validated performance of QCancer-10+LDP and QCancer-10+GWS models, compared with external validation of QCancer-10 in the same participants. Values are performance indices plus 95% confidence intervals. QCancer-10/PRS HR per SD – adjusted hazard ratio of QCancer-10 score or PRS in model per standard deviation of the PRS; C – Harrell’s C statistic; Dxy – Somers’ Dxy rank correlation; D – Royston’s D statistic; R2D – Royston and Sauerbrei’s (explained variation); Slope – Calibration Slope. Pairwise comparisons of performance metrics were all significantly different P<0.001. *modelled using multiple fractional polynomial and therefore not presented.
QCancer-10+PRS models
Given the similarities in performance of LDpred2-grid and LDpred2-grid-sp models, we selected LDpred2-grid-sp as the top-performing genome-wide PRS for integrated modelling with QCancer-10, favouring sparsity (i.e. a PRS containing fewer SNPs; see Supplementary Results for full model specifications and baseline hazards). We found evidence of an interaction between QCancer-10 and the GWAS-significant PRS in men (P interaction = 0.004), with reduced effect of QCancer-10 score at higher PRS, but did not ultimately include this in the model given the relative weakness of the interaction (Table S12, Figure S18).
Cox models combining the QCancer-10 risk score with LDpred2-grid-sp (QCancer-10+LDP), and the GWAS-sig PRS (QCancer-10+GWS) both out-performed QCancer-10 (Table 3). Figure 2 shows Kaplan-Meier curves across 4 risk groups in integrated QCancer-10+PRS models compared to QCancer-10 alone, demonstrating improved separation between risk groups with the addition of PRS. Internal validation of the QCancer-10+PRS models showed very little optimism in performance estimates.
Kaplan-Meier curves across four risk groups (group 4 being highest risk) for QCancer-10+LDP and QCancer-10+GWS models compared to QCancer-10 in men and women. QCa – QCancer-10 model; QCa+LDP – Qcancer-10+LDP model; QCa+GWS – QCancer-10+GWS model.
Models predicting risk in men had better discrimination, and explained more of the variation in risk than models for women (Table 3). Calibration by age was good (Figure S16), with slight under-prediction of risk in the top age group in women. As with QCancer-10, in those with a first degree family history of CRC, female QCancer-10+PRS models tended to over-predict risk, particularly in higher risk groups; male QCancer-10+PRS models were well calibrated (Table S11, Figure S17). In minority ethnicities, QCancer-10+PRS models underpredicted risk (Table S11) to a greater extent than QCancer-10, subject to the caveat of a low CRC case numbers (46 men, 58 in women) in this subgroup.
QCancer-10+LDP consistently provided the best risk prediction. Table 4 shows the sensitivity of the Qancer-10+LDP model in predicting CRC risk over five years across the top 25 centiles of absolute risk. To illustrate, individuals predicted to be in the top 20% of absolute risk by QCancer-10+LDP accounted for 47.6% of male cases and 42.4% of female cases. QCancer-10 and QCancer-10+GWS had lower sensitivity than QCancer-10+LDP (Tables S13-S16). Men in the top 5% of absolute risk by the QCancer-10+LDP model had >3.49-fold increased absolute 5-year risk compared to the median, with a comparable 2.75-fold increase in women. For QCancer-10+GWS this was 3.14-fold and 2.37-fold in men and women respectively, and for QCancer-10 2.37 and 2.06-fold (Table S17). Decision curve analyses confirmed that, across a wide range of probability thresholds, QCancer-10+LDP gave greater net benefit than QCancer-10+GWS and QCancer-10 for both men and women (Figure 3), and predicted a greater number of interventions avoided across clinically relevant thresholds.
The absolute risk of the top 20% are highlighted.
Figures show net benefit in men (A) and women (B), and interventions avoided per 100 patients tested in men (C) and women (D). The thin grey line in net benefit curves indicates intervention for all, the thick black line no intervention.
By way of illustration, enhanced screening is frequently offered for those with a single first degree relative with CRC (FDRCRC), corresponding to a ∼2.2-fold increased risk. 27 QCancer-10+LDP identified 18.2% of men (34.0% of cases) and 7.2% of women (16.5% of cases) as having a relative risk >2.2, of whom 76% and 70% respectively had no FDRCRC (see Table S18 for equivalent values for QCancer-10+GWS and QCancer-10).
Discussion
We have undertaken the first study to develop and validate new prediction models for colorectal cancer that combine phenotypic risk with genome-wide PRS. We show that combining the non-genetic QCancer-10 model with either genome-wide or GWAS-significant PRS significantly improves model performance and clinical benefit, with greatest improvements seen in the QCancer-10+LDP model. The QCancer-10+LDP models have higher discrimination in UKB than any previously published CRC risk score. 6,9
Our models could be used to improve or instigate risk-stratified CRC screening. QCancer-10 4 has recently been recommended to guide shared decision making around CRC screening. 5 Our QCancer-10+PRS risk models would provide more accurate risk information for patients and healthcare professionals on which to base screening decisions. Our study predicts that QCancer-10+PRS models have a greater net-benefit and avoid more interventions than QCancer-10 across a wide range of clinically-relevant risk thresholds, with the greatest benefit from QCancer-10+LDP. The sensitivities achieved using QCancer-10+PRS exceed those of other integrated models recently validated in UKBioank. 6 The genome-wide SNP genotyping required for LDpred2 is reliably performed from saliva samples, and is rapid, inexpensive and straightforward to analyse. The sensitivities and decision curves provided by QCancer-10+LDP could therefore be used to inform clinical decision making.
Of the PRS method evaluated, LDpred2-grid and LDpred2-grid-sp models had highest discrimination, explained more of the variation in risk, and were well calibrated. The improvement in performance between the derivation and validation cohorts when using the PRS models probably r esults from lower genetic homogeneity in the latter. Evaluation of the PRS in a geographically external cohort demonstrates portability of the models. The Geographic Validation Cohort was well matched in age to the derivation cohort, but had a higher proportion of women; prevalence of CRC was higher, at 1.79% compared to 1.51% in the derivation cohort. We would expect performance in Northern European individuals in the general population to be similar to that of the Validation Cohort.
Our PRS findings are in line with a recent study in which a PRS derived using LDpred software (an earlier version of LDpred2) out-performed both machine learning approaches and a 140 GWAS-significant SNP PRS. 7 In contrast, a recent study comparing Lassosum software with clumping and thresholding approaches in UK Biobank found the optimal CRC PRS to contain just 87 SNPs, with an age- and sex-adjusted AUC of just 0.617. 8 Several genome-wide PRS software tools are now available, with differences in performance across disease types, 21 highlighting the importance of evaluating multiple approaches in different phenotypes. Previous studies have found that models combining GWAS-significant PRS and non-genetic risk predictors perform better than PRS alone 6 or non-genetic risk factors alone. 9 Our work supports and extends this by demonstrating the stepwise improvement in performance obtained with genome-wide PRS.
A key strength of our study is the avoidance of overlap between our GWAS meta-analysis datasets and modelling cohorts, thus reducing overfitting of the PRS and performance optimism. 10 We used expected genotype dosages rather than allele counts in each PRS, incorporating uncertainty in genotype imputation, and applied correction for ascertainment bias to effect sizes in the GWAS-significant model. Our GWAS-significant PRS used stringent inclusion criteria, including only SNPs which replicated in our UKB-free meta-analysis.
UKB provides a large sample size, extensive phenotyping, completeness of data recording, and linkage to external datasets. Linkage to cancer registry data in UKB ended in 2015/16 at the time of our study, so we have only been able to follow-up for a median 7 years; updating this will improve risk estimates and permit estimation of risks over a longer follow-up. The UKB age range of ∼40-70 is similar to that of bowel cancer screening (50 to 74 in England and Scotland), but narrower than 25 to 84 used in the original Qcancer-10 study. 4 However, model performance in UKB is arguably unlikely to reflect relative performance in the general population, for several reasons. Model performance will vary between populations with different prevalence or risk of a disease – known as the ‘spectrum effect’. As UKB has a lower incidence of disease than the general population of screening age, one might expect sensitivity to increase (which is of benefit in a screening test) when applied to a population with higher risk. 28 Furthermore, all of our models appeared to perform less well in females. For PRS, wide confidence intervals in the Geographic Validation Cohort mean this finding should be interpreted with caution, but for models that include QCancer-10, this difference was not unexpected. The known healthy volunteer bias that exists in UKB is especially marked in women (for example, the reduction in all-cause mortality and overall cancer incidence in UKB relative to the general population is greater for women than men). 29 In addition, the available sample size and number of incident cases for women in our Integrated Modelling Cohort fell slightly short of sample size requirements (see Supplementary Methods). As a result our estimates of risk may be less precise for women, and external validation would be essential prior to implementation. The QCancer-10 model has previously been shown to perform worse when validated in UKB than in the QResearch validation. 4 This is likely to be due to the differences in age distribution between the general population sample used to develop the original QCancer-10 score and the more restricted UKB sample in this study. External validation of a separate QCancer (colorectal) score for symptomatic patients in an independent population-based cohort showed comparable performance to the discovery study. 30 Overall, risk model performance should be validated in a population representative of the screening population, and we have shown that PRS calibration can be largely corrected in new (ethnically similar) populations by recalibration.
Further limitations of our study may include unknown differences in the demographics of the contributing base GWAS datasets and UKB. We did not include Mendelian CRC syndromes in the genetic model, and doing so would almost certainly improve risk prediction. Another major limitation of our study, and PRS generally, is that most models are developed in individuals of European ethnicity. Although most CRC risk SNPs appear to be shared across ethnic groups, quantitative risk estimates cannot readily be transferred across populations, 31 and, as anticipated, our PRS performed poorly in the Minority Ethnic Validation Cohort. 31-33 As minority ethnic populations often have higher CRC associated mortality and lower screening uptake, 34 further work is urgently needed to expand PRS for CRC in these populations to avoid exacerbating existing health inequalities.
With a renewed focus on prevention and early diagnosis in healthcare, 35 refined risk prediction is likely to have a significant role. It has been proposed that risk scores could be used to improve diagnosis in symptomatic individuals (for example, https://ourfuturehealth.org.uk/). However, the combined risk score alone is unlikely to achieve the performance standards required for diagnostic tests, and would need to be evaluated carefully in combination with quantitative faecal immunochemical test (FIT) results, symptoms and clinical signs.
In population cancer screening programmes, in contrast, a risk score with moderate predictive value has considerable potential for improving current performance through patient stratification. Our risk score predicts that ∼10% of the population aged ∼40-70 have relative risks of CRC high enough to warrant enhanced surveillance under guidelines used in familial risk. 36 The risk models constructed here perform at a level that may well be clinically useful in population screening. 37 In practice, we envisage that risk scores are likely to be used alongside FIT to allocate colonoscopy more effectively, maintaining universal access to screening whilst improving performance. Nevertheless, we emphasise that validation of the risk score in a cohort representative of the screening population and evaluation in a prospective trial are required to evaluate the performance, acceptability and cost effectiveness of a combined risk model in a FIT-based bowel cancer screening programme.
Data Availability
UK Biobank data can be obtained through http://www.ukbiobank.ac.uk/. Genotype data are available in the European Genome-phenome Archive under accession numbers EGAS00001005412, EGAS00001005421, or from the Edinburgh University DataShare Repository (https://datashare.ed.ac.uk/). Finnish cohort samples can be requested from the THL Biobank https://thl.fi/en/web/thl-biobank. PRS SNP inclusion lists and model specifications will be deposited in the PGS catalogue repository (https://www.pgscatalog.org/). Risk scores for UKB participants will be returned to UK Biobank for use by approved researchers.
Contributors
All authors contributed to study conception and design, with development of PRS and statistical analysis led by SEWB, IT and JHC. IT, MD and RH provided data. SEWB, IT, PL and RH have accessed and verified the underlying data. SEWB carried out primary data analysis. SEWB completed the statistical analysis under supervision of IT and JHC. IT, JCH, SE and JEE supervised the project. SEWB and IT wrote the first draft of the manuscript. All authors contributed to critical revision and editing of the manuscript, and have approved the final version. JHC and IT are guarantors.
Ethics Approval
This research has been conducted using data from UK Biobank, a major biomedical database, http://www.ukbiobank.ac.uk/. The UK Biobank study has ethical approval from the North West Multi-centre Research Ethics Committee (16/NW/0274). This study was performed under UK Biobank application number 8508. All contributing GWAS studies were undertaken with ethical review board approval at respective study centres as detailed in Law et al.16
Funding/acknowledgements
We thank all individuals who agreed to participate in the contributing GWAS studies and in UK Biobank, and the investigators, research associates and wider teams involved in these studies. We thank the authors of LDpred2 for their instructive PRS tutorial and publicly available code.
SB is supported by an MRC Clinical Research Training Fellowship (MR/P001106/1). JEE and SW receive funding from the Oxford NIHR Biomedical Research Centre (BRC). This work of the Houlston Laboratory (PL, RH) is supported by a grant from Cancer Research UK (CRUK) (C1298/A25514). JHC received funding from the John Fell Oxford University Press Research Fund, grants from CRUK grant number C5255/A18085, through the CRUK Oxford Centre, grants from the Oxford Wellcome Institutional Strategic Support Fund (204826/Z/16/Z) and other research councils, during the conduct of the study. MD is funded by CRUK Programme Grant C348/A12076. IT is funded by CR-UK Programme Grant C6199/A27327. The research was supported by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z with funding from the NIHR Oxford BRC. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. Funding bodies had no role in the design, analysis, writing or decision to submit.
Declaration of interests
All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: no support from any organisation for the submitted work other than that listed above; JHC is founder and shareholder of ClinRisk Ltd which supplies free open-source software for research purposes and also licenses other closed source software to implement risk prediction tools into NHS computer systems outside the submitted work and was its medical director until June 2019, JEE has served on clinical advisory boards for Lumendi, Boston Scientific, and Paion, and has served on the clinical advisory board and owns share options in Satisfai Health, and reports speaker fees from Falk; JHC is director of the QResearch database – a not-for-profit collaboration between University of Oxford and EMIS (commercial supplier of NHS computer systems) and is an adviser to the CMO in England on cancer screening, JEE serves on the ACPGBI / BSG guideline group for implementation FIT for the detection of CRC in patients with symptoms suspicious of CRC.
Data Sharing Statement
UK Biobank data can be obtained through http://www.ukbiobank.ac.uk/. Genotype data are available in the European Genome-phenome Archive under accession numbers EGAS00001005412, EGAS00001005421, or from the Edinburgh University DataShare Repository (https://datashare.ed.ac.uk/). Finnish cohort samples can be requested from the THL Biobank https://thl.fi/en/web/thl-biobank. PRS SNP inclusion lists and model specifications will be deposited in the PGS catalogue repository (https://www.pgscatalog.org/). Risk scores for UKB participants will be returned to UK Biobank for use by approved researchers.
Transparency
The lead author (IT) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; that discrepancies from the study as planned have been explained, and that the paper conforms to transparency policy of the International Committee of Medical Journal Editors uniform requirement for manuscripts submitted to biomedical journals.
Footnotes
Manuscript updated provide more methodology in main paper; additional statistical comparisons of model performance added; Table 3 moved to supplementary data