Adenomas from individuals with pathogenic biallelic variants in the MUTYH and NTHL1 genes demonstrate base excision repair tumour mutational signature profiles similar to colorectal cancers, expanding potential diagnostic and variant classification applications ===================================================================================================================================================================================================================================================================== * Romy Walker * Jihoon E. Joo * Khalid Mahmood * Mark Clendenning * Julia Como * Susan G. Preston * Sharelle Joseland * Bernard J. Pope * Ana B. D. Medeiros * Brenely V. Murillo * Nicholas Pachter * Kevin Sweet * Allan D. Spigelman * Alexandra Groves * Margaret Gleeson * Krzysztof Bernatowicz * Nicola Poplawski * Lesley Andrews * Emma Healey * Steven Gallinger * Robert C. Grant * Aung K. Win * John L. Hopper * Mark A. Jenkins * Giovana T. Torrezan * Christophe Rosty * Finlay A. Macrae * Ingrid M. Winship * Daniel D. Buchanan * Peter Georgeson ## ABSTRACT **Background** Colorectal cancers (CRCs) from people with biallelic germline likely pathogenic/pathogenic variants in *MUTYH* or *NTHL1* exhibit specific single base substitution (SBS) mutational signatures, namely combined SBS18 and SBS36 (SBS18+SBS36), and SBS30, respectively. The aim was to determine if adenomas from biallelic cases demonstrated these mutational signatures at diagnostic levels. **Methods** Whole-exome sequencing of FFPE tissue and matched blood-derived DNA was performed on 9 adenomas and 15 CRCs from 13 biallelic *MUTYH* cases, on 7 adenomas and 2 CRCs from 5 biallelic *NTHL1* cases and on 27 adenomas and 26 CRCs from 46 non-hereditary (sporadic) participants. All samples were assessed for COSMIC v3.2 SBS mutational signatures. **Results** In biallelic *MUTYH* cases, SBS18+SBS36 signature proportions in adenomas (mean±standard deviation, 65.6%±29.6%) were not significantly different to those observed in CRCs (76.2%±20.5%, *p-value*=0.37), but were significantly higher compared with non-hereditary adenomas (7.6%±7.0%, *p-value*=3.4x10-4). Similarly, in biallelic *NTHL1* cases, SBS30 signature proportions in adenomas (74.5%±9.4%) were similar to those in CRCs (78.8%±2.4%) but significantly higher compared with non-hereditary adenomas (2.8%±3.6%, *p-value*=5.1x10-7). Additionally, a compound heterozygote with the c.1187G>A p.(Gly396Asp) pathogenic variant and the c.533G>C p.(Gly178Ala) variant of unknown significance (VUS) in *MUTYH* demonstrated high levels of SBS18+SBS36 in four adenomas and one CRC, providing evidence for reclassification of the VUS to pathogenic. **Conclusions** SBS18+SBS36 and SBS30 were enriched in adenomas at comparable proportions observed in CRCs from biallelic *MUTYH* and biallelic *NTHL1* cases, respectively. Therefore, testing adenomas may improve the identification of biallelic cases and facilitate variant classification, ultimately enabling opportunities for CRC prevention. KEYWORDS * Colorectal cancer * mutational signature * adenoma * hereditary cancer predisposition * SBS18 * SBS36 * SBS30 * *MUTYH* * *NTHL1* * variant of uncertain clinical significance ## INTRODUCTION Identifying people who have an increased risk of developing colorectal cancer (CRC), including people with a hereditary CRC or polyposis syndrome, provides important opportunities for cancer prevention. Individuals with homozygous or compound heterozygous likely pathogenic or pathogenic (LP/P) variants in the base excision repair genes *MUTYH* [1] and *NTHL1* [2] (i.e., biallelic cases) predispose to the development of multiple pre-cancerous adenomas in the colon (adenomatous polyposis), CRC and a spectrum of extra-colonic cancers [2,3]. The application of tumour mutational signature profiling to identify hereditary cancer syndromes related to DNA repair defects has been highlighted [4]. Single base substitution (SBS) and insertion/deletion mutational signatures in CRC have been shown to be accurate predictors of Lynch syndrome and biallelic germline LP/P variants in *MUTYH* [5]. In particular, the combination of SBS18 and SBS36 (SBS18+SBS36) can accurately identify those with germline biallelic *MUTYH* LP/P variants [5,6], while for *NTHL1*, the SBS30 mutational signature has been identified in CRCs from those with biallelic *NTHL1* LP/P variants [7]. Moreover, we previously identified two recurrent somatic mutations, namely the *KRAS* c.34G>T p.(Gly12Cys) and the *PIK3CA* c.1636C>A p.(Gln546Lys) mutations, that were strongly enriched in CRCs from biallelic *MUTYH* cases compared with CRCs from non-hereditary/sporadic cases (*KRAS*: *p-value*=1.4x10-6, *PIK3CA*: *p-value*=3.4x10-4) [6]. A further application of tumour mutational signature profiling is to aid variant classification. Previously, we have shown the presence of elevated levels of SBS18+SBS36 in CRCs provided evidence for an LP/P classification for the germline *MUTYH* variants c.1141G>T p.(Gly381Trp) and c.577-5A>G, where the second allele of *MUTYH* harboured an LP/P variant [6]. Alternatively, the absence of high levels of SBS18+SBS36 in CRCs supported a benign classification for *MUTYH* variants c.912C>G p.(Ser304Arg), c.821G>A p.(Arg274Gln), c.925C>T p.(Arg309Cys) and c.1431G>C p.(Thr477Thr) [6]. While these genomic features have been shown to be effective with CRC-derived data, there are important implications that could be facilitated by the ability to utilise mutational signature profiling in pre-cancerous adenomas namely: 1) identifying biallelic cases early before they develop cancer, including guiding surgical versus endoscopic management decision making, 2) enable pre-emptive genetic counselling and guide patient management strategies through risk assessment, 3) indicate if a second “unidentified” LP/P variant is present in monoallelic LP/P variant carriers, and 4) provide evidence for pathogenicity for variants of uncertain significance (VUS). The aim of this study was to profile and compare the SBS18+SBS36 and SBS30 mutational signatures in adenomas and CRCs from biallelic *MUTYH* and biallelic *NTHL1* cases, respectively, with sporadic adenomas and CRCs from participants without a hereditary CRC/polyposis syndrome to determine their discriminatory potential and ability to inform variant classification. ## MATERIAL AND METHODS ### Study cohort Participants were men and women recruited to one of the following studies: 1) Applying Novel Genomic approaches to Early-onset and suspected Lynch Syndrome colorectal and endometrial cancers (ANGELS, n=4), 2) Colorectal Cancer Family Registry (CCFR, n=21) or 3) Genetics of Colonic Polyposis Study (GCPS, n=5) who were identified to have either germline biallelic *MUTYH* or germline biallelic *NTHL1* LP/P variants from clinical diagnostic or research genetic testing. Formalin-fixed paraffin embedded (FFPE) tissue was collected for tumour mutational signature profiling comprising: 1. 9 adenomas and 15 CRCs from 13 biallelic *MUTYH* cases; 2. 4 CRCs from 4 monoallelic *MUTYH* cases; 3. 7 adenomas, 1 hyperplastic polyp, 1 traditional serrated adenoma and 2 CRCs from 7 biallelic *NTHL1* cases and 4. 2 CRCs from 2 monoallelic *NTHL1* cases. A reference/control group of 46 participants from the CCFR who developed mismatch repair (MMR)-proficient adenomas (n=27) and/or MMR-proficient CRCs (n=26) and who were confirmed to not carry LP/P variants in 16 hereditary CRC/polyposis genes as defined in Seifert *et al.* [8] (i.e., non-hereditary/sporadic cases) were included in this study. The mutational signature profiles of 12 CRCs from eight biallelic *MUTYH* cases and 4 CRCs from 4 monoallelic *MUTYH* cases described above have been reported previously [5,6]. Two CRCs from two biallelic and two monoallelic *NTHL1* cases described above have been reported previously [7]. The studies were approved by the respective ethics committees and institutional review boards. All participants provided written informed consent for collection of tissue and peripheral blood samples. ### Whole-exome sequencing and bioinformatic analysis Adenoma and CRC tissue DNA and matched blood-derived DNA underwent whole-exome sequencing (WES) using the SureSelect Clinical Research Exome v.2 kit (Agilent Technologies, Santa Clara, CA, United States), to a median depth of 357.9 reads (interquartile range (IQR)=287.8-464.0) for FFPE tissue DNA samples and median depth of 179.1 reads (IQR=118.1-204.6) for blood-derived DNA samples. Somatic single-nucleotide variants and short insertion/deletions were determined using the intersection of calls from Strelka (v.2.9.2) [9] and Mutect2 (v.4.0) [10]. Tumour mutation burden (TMB) was calculated as the total number of all somatic single-nucleotide variants and short insertion/deletions observed in a sample divided by the size of the capture region (67Mb). A threshold for including variants was chosen based on a minimum depth (50 reads) and a minimum variant allele frequency of 10% as previously published [5]. Mutational signature profiles were calculated using the simulated annealing method previously described by SignatureEstimation [11] using a reduced set of 16 SBS signatures (**Supplementary Table 1**) as previously determined to be present in the colon/colorectal cancer tissue [5–7,12–19]. The following RefSeq transcripts were used: NM_001128425.1 (*MUTYH*), NM_002528.7 (*NTHL1*), NM_001369786.1 (*KRAS*) and NM_006218.4 (*PIK3CA*). ### Statistical analysis For each signature profile, we compared the biallelic cases with the corresponding CRCs or adenomas from the non-hereditary group. Statistical significance between two groups was determined using a two-sided *t-test* with a *p-value*<0.05 considered to be statistically significant. For group comparisons, one-way *ANOVA* was used. Additionally, we determined the *Cohen’s d* effect size to measure the difference between the means of two subgroups. ### Source code All data analysis was performed using Python v.3.11 [20], Numpy v.1.24 [21] and Scikit-Learn v.1.3 [22]. Data visualisation was done using the R programming language v.4.3.2 [23] and RStudio v.0.16.0 [24] using the following packages: ggplot2 v.3.5.1 [25], cowplot v.1.1.3 [26] and dplyr v.1.1.4 [27]. ## RESULTS The clinicopathological characteristics of the participants and their adenomas and CRCs are shown in **Table 1**. The biallelic *MUTYH* and biallelic *NTHL1* cases are presented in **Supplementary Table 2**. Of note, all adenomas and CRCs were MMR-proficient by immunohistochemistry except for two biallelic *MUTYH* cases; Pat\_301 (2xCRCs at 50 years, one MMR-proficient and one MMR-deficient with MLH1/PMS2 loss), and Pat_315 (1xCRC at 39 years with MSH2/MSH6 loss). The SBS mutational signature profiles of each adenoma and CRC included in the study are presented in **Figure 1**. ![Figure 1:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/08/09/2024.08.08.24311713/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2024/08/09/2024.08.08.24311713/F1) Figure 1: Mutational signatures observed across the cohort. View this table: [Table 1.](http://medrxiv.org/content/early/2024/08/09/2024.08.08.24311713/T1) Table 1. The clinicopathological characteristics of the participants and their adenomas and CRCs from each of the biallelic *MUTYH* cases, biallelic *NTHL1* cases and the adenomas and CRCs from the non-hereditary (control) groups included in this study. Overview of the phenotypes by sex, age at diagnosis (including mean and standard deviation), anatomical site, histological type, T stage, grade of tumour and study separated by adenoma and colorectal cancer tissue type and case subgroups. ### The SBS18+SBS36 mutational signature is elevated in both adenomas and CRCs from biallelic MUTYH cases The mean (±standard deviation) proportion of SBS18+SBS36 in the adenomas (65.6%±29.6%) and MMR-proficient CRCs (76.2%±20.5%) from biallelic *MUTYH* cases were not significantly different (*p-value*=0.37) (**Figure 2A**, **Table 2**). This result is further highlighted when comparing the SBS18+SB36 proportions in adenomas and CRCs from the same participant (**Figure 3**). In contrast, the mean proportion of SBS18+SBS36 in adenomas and CRCs from biallelic *MUTYH* cases were significantly higher compared with the mean proportion in non-hereditary adenomas (65.6%±29.6% versus 7.6%±7.0%, *p-value*=3.4x10-4) and CRCs (76.2%±20.5% versus 6.5%±5.5%, *p-value*=2.2x10-8) (**Figure 2A**, **Table 3**). ![Figure 2:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/08/09/2024.08.08.24311713/F2.medium.gif) [Figure 2:](http://medrxiv.org/content/early/2024/08/09/2024.08.08.24311713/F2) Figure 2: Boxplots of whole-exome sequencing derived genomic features for A) SBS18+SBS36 proportions in *MUTYH* cases and non-hereditary groups and B) SBS30 proportions in *NTHL1* cases and non-hereditary groups. ![Figure 3.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/08/09/2024.08.08.24311713/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2024/08/09/2024.08.08.24311713/F3) Figure 3. Line plot displaying the comparison of SBS18+SBS36 signature proportions for adenomas and colorectal cancers related to each biallelic *MUTYH* case and for the participant with a pathogenic and variant of uncertain significance in *MUTYH* (Pat_763). View this table: [Table 2.](http://medrxiv.org/content/early/2024/08/09/2024.08.08.24311713/T2) Table 2. The mean, standard deviation, and range of five genomic features derived from whole-exome sequencing testing for their differences between tissue type and by *MUTYH* or *NTHL1* case or non-hereditary status. Statistically significant p-values are highlighted in **bold**. View this table: [Table 3.](http://medrxiv.org/content/early/2024/08/09/2024.08.08.24311713/T3) Table 3. The mean, standard deviation, and range of five genomic features derived from whole-exome sequencing testing assessed for their differences between *MUTYH* or *NTHL1* case or non-hereditary status for colorectal adenomas and colorectal cancers separately. Statistically significant p-values are highlighted in **bold**. ### Co-occurrence of mutational processes related to defective MUTYH and defective DNA mismatch repair In the two MMR-deficient CRCs from biallelic *MUTYH* cases (Pat_301 and Pat_315), the mean proportion of SBS18+SBS36, was significantly lower compared with the MMR-proficient CRCs from biallelic *MUTYH* cases (20.0%±0.5% versus 76.2%±20.5%, *p-value*=3.8x10-7) but they were still higher compared with non-hereditary CRCs (6.5%±5.5%, *p-value*=2.2x10-8) (**Figure 2A**, **Table 3**). Both these MMR-deficient CRCs also showed higher proportions of SBS15 and SBS44, which are mutational signatures associated with MMR-deficiency (**Figure 1**). In addition, the TMB of these two MMR-deficient CRCs (53.9 and 25.4 mutations/Mb, respectively) was higher compared with the mean TMB of the MMR-proficient CRCs from biallelic *MUTYH* cases (7.5±2.9 mutations/Mb) (**Figure 4**). The cause of MMR-deficiency in Pat\_301 and Pat\_315 was not related to carrying a germline LP/P variant in one of the DNA MMR genes, but rather from two somatic MMR mutations causing biallelic inactivation in each CRC as determined from the WES data. The CRC showing loss of MLH1/PMS2 protein expression from Pat_301 had two somatic mutations in *MLH1* (c.1813G>T p.(Glu605Ter) and c.1816G>T p.(Gly606Ter)) and no evidence of tumour *MLH1* promoter hypermethylation. The CRC showing loss of MSH2/MSH6 protein expression from Pat_315 had a somatic mutation in *MSH2* (c.394G>T p.(Glu132Ter)) and loss of heterozygosity indicating loss of the wildtype *MSH2* allele. The somatic single nucleotide mutations observed in *MLH1* and *MSH2* matched the mutational contexts associated with SBS18 and SBS36 (Pat_301:TCT>TAT and TCC>TAC; Pat_315:TCA>TTA), suggesting the constitutionally defective *MUTYH* contributed to these somatic MMR mutational events and resulted in MMR-deficiency in these two CRCs. Interestingly, the synchronous MMR-proficient CRC from Pat_301 exhibited a high proportion of SBS18+SBS36 (94.2%) and a low TMB (9.4 mutations/Mb), further highlighting the impact of tumour MMR-deficiency on the SBS18+SBS36 signature proportions in biallelic *MUTYH* cases. ### Somatic mutations as biomarkers of biallelic MUTYH status in adenomas Previously, the *KRAS* c.34G>T p.(Gly12Cys) and *PIK3CA* c.1636C>A p.(Gln546Lys) somatic mutations were shown to be recurrent mutations significantly increased in CRCs from biallelic *MUTYH* pathogenic variant cases [6]. In adenomas, the *KRAS* c.34G>T mutation was present in 6/9 (66.7%) and 2/27 (7.4%) of the biallelic *MUTYH* and non-hereditary adenomas, respectively (*p-value*=3.1x10-2). The *KRAS* c.34G>T mutation had a positive predictive value of 75% and a negative predictive value of 89.3% in adenomas compared with a positive predictive value of 100% and negative predictive value of 86.7% in CRCs, indicating that the somatic *KRAS* mutation may not be as clinically useful in identifying biallelic *MUTYH* cases in adenomas as it is in CRCs. The *PIK3CA* c.1636C>A mutation was not observed in adenomas from biallelic *MUTYH* cases or in adenomas from the non-hereditary group (**Supplementary Figure 1**). ### The SBS18+SBS36 mutational signature provides evidence for variant classification We profiled four adenomas and a CRC from Pat_763 who carried a germline heterozygous pathogenic variant (c.1187G>A p.(Gly396Asp)) and a germline heterozygous VUS (c.533G>C p.(Gly178Ala)) in *MUTYH.* All four adenomas (mean proportion: 73.0%±14.9%, range: 57.3%-88.0%) and the CRC (72.7%) demonstrated high proportions of SBS18+SBS36 consistent with germline biallelic inactivation of *MUTYH* (**Figure 2A**, **Figure 3**). No somatic second hits in *MUTYH* were observed that may have accounted for the high SBS18+SBS36 signature proportions in the adenomas and CRC. These findings support a reclassification of the *MUTYH* c.533G>C p.(Gly178Ala) variant as likely pathogenic. ### The SBS30 mutational signature is elevated in both adenomas and CRCs from biallelic NTHL1 cases The mean proportion of SBS30 in adenomas (74.5%±9.4%) and CRCs (78.8%±2.4%) from biallelic *NTHL1* cases were not significantly different (*p-value*=0.31) (**Figure 2B**, **Table 2**). The mean proportion of SBS30 in adenomas from biallelic *NTHL1* cases was, however, significantly higher compared with the mean proportion in non-hereditary adenomas (74.5%±9.4% versus 2.8%±1.3%; p-value=5.1x10-7) (**Figure 2B**, **Table 3**). In addition to 7 adenomas and 2 CRCs, a hyperplastic polyp and a traditional serrated adenoma from two biallelic *NTHL1* cases (Pat_005 and Pat_469) were tested. Of note, the traditional serrated adenoma showed high proportion of SBS30 at 69.4%, whereas the SBS30 proportion in the hyperplastic polyp was only 6.2% (**Figure 2B**). ## DISCUSSION In this study, we showed that the SBS18+SBS36 and SBS30 mutational signatures associated with biallelic *MUTYH* and biallelic *NTHL1* deficiencies, were present in adenomas at similar proportions to those observed in CRCs and were significantly higher when compared with the proportions observed in non-hereditary adenomas and CRCs. Together, these results demonstrate the presence of these mutational processes and consequent mutational signatures, at diagnostic levels in the pre-malignant stage, thereby enabling the opportunity for early CRC detection by expanding the potential tissue available for profiling. We identified two scenarios where SBS30 or SBS18+SBS36 may present with limitations. Firstly, although SBS30 was shown to be a predominant mutational signature in adenomas from biallelic *NTHL1* cases, our results showed variable presence of SBS30 in two serrated polyp subtypes, 69.4% in the traditional serrated adenoma and only 6.2% in the hyperplastic polyp. As biallelic *NTHL1* cases can present with mixed polyp types [28], further research is needed to determine the utility of testing serrated polyps for mutational signatures for *NTHL1* and more broadly for other hereditary CRC/polyposis syndromes. Secondly, we tested two MMR-deficient CRCs from two biallelic *MUTYH* cases where the mutational signature profile showed defective MMR related to the presence of SBS15 and SBS44 and a hypermutated TMB that co-occurred with the SBS18+SBS36 signature, albeit at lower proportions than observed in MMR-proficient CRCs from biallelic *MUTYH* cases. These findings highlight MMR-deficiency as an important diagnostic caveat for utilising SBS18+SBS36 to identify biallelic *MUTYH* cases or for classifying variants. This study extends on our previous work for applying SBS18+SBS36 in CRCs to reclassify VUSs in *MUTYH* [5]. We showed high levels of SBS18+SBS36 in the CRC and multiple adenomas from the same person provides high confidence that the *MUTYH* c.533G>C p.(Gly178Ala) variant is pathogenic. Additional evidence related to its absence in gnomAD and from *in-silico* predictions from REVEL, SIFT, PolyPhen-2 and Align-GVGD suggest this missense change affects protein function, further supporting pathogenicity ([https://www.ncbi.nlm.nih.gov/clinvar/variation/481808/](https://www.ncbi.nlm.nih.gov/clinvar/variation/481808/), last accessed date: August 1st, 2024). The ability to test multiple independent adenomas/CRCs provides high confidence for variant classification where all or none of the lesions have the signature. The clinical genetics community is increasingly challenged by VUS, where around half (47.8%, 1329/2782) of *MUTYH* variants in ClinVar are currently classified as VUS (stand: August 6th, 2024) [29]. Approaches to classify variants with existing and widely used infrastructure i.e., next generation sequencing and validated bioinformatic tools, will aid in reclassifying variants and optimising clinical management and cancer prevention for the patient and their relatives. Limitations of this study include the lack of ethnic diversity within the case and non-hereditary groups which were predominantly white European. Similarly, there was a limited range of germline LP/P variants for both *MUTYH* and *NTHL1*. The consistency of mutational signature findings across a broader group of cases of different pathogenic variants and ethnic backgrounds would provide evidence of the robustness of this approach. All of the CRCs and adenomas tested in this study were from FFPE tissue, however we have previously shown that mutational signature profiling is effective in both FFPE and fresh frozen tissue DNA samples [5]. ## CONCLUSIONS This study provides important findings demonstrating that testing adenomas for SBS18+SBS36 or SBS30 can be an equally effective alternative to identifying biallelic *MUTYH* or biallelic *NTHL1* cases, respectively, if CRC has not yet developed or tissue is not available. This provides important opportunities for clinical management decision-making such as colectomy versus endoscopic polypectomy for CRC prevention given the established high CRC penetrance in biallelic cases. Furthermore, the specificity of these signatures enables the utility of mutational signature profiling to classify VUS. Our study identified potential caveats to using mutational signatures diagnostically, namely, the presence of MMR-deficiency which may diminish the SBS18+SBS36 signature, while for SBS30, testing of serrated polyps needs further investigation. This study adds to the growing evidence of the clinical utility of gene specific mutational signature profiling for identifying hereditary CRC/polyposis syndromes and further expands the opportunities to utilise mutational signatures as a supportive feature for variant classification. ## Supporting information Supplementary Material [[supplements/311713_file03.pdf]](pending:yes) ## Data Availability The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request. ## AVAILABLILITY OF DATA AND MATERIALS The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request. ## FUNDING Funding by a National Health and Medical Research Council of Australia (NHMRC) Investigator grant GNT1194896 awarded to DDB and funding by a Cure Cancer Early Career Research Grant awarded to PG supported the design, analysis, and interpretation of data. DDB is supported by a University of Melbourne Dame Kate Campbell Fellowship. PG is supported by an NHMRC Investigator Grant (2026331). RW is supported by the University of Melbourne Early Career Researcher Grant. BJP is supported by a Victorian Health and Medical Research Fellowship from the Victorian Government. AKW is supported by an NHMRC Investigator grant (GNT1194392). JLH is supported by the University of Melbourne Dame Kate Campbell Fellowship. MAJ is supported by an NHMRC Investigator grant (GNT1195099). The Colon Cancer Family Registry (CCFR, [www.coloncfr.org](http://www.coloncfr.org)) is supported in part by funding from the National Cancer Institute (NCI), National Institutes of Health (NIH) (award U01 CA167551). Support for case ascertainment was provided in part from the Surveillance, Epidemiology, and End Results (SEER) Program and the following U.S. state cancer registries: AZ, CO, MN, NC, NH; and by the Victoria Cancer Registry (Australia) and Ontario Cancer Registry (Canada). The content of this manuscript does not necessarily reflect the views or policies of the NIH or any of the collaborating centres in the CCFR, nor does mention of trade names, commercial products, or organisations imply endorsement by the US Government, any cancer registry, or the CCFR. ## AUTHOR CONTRIBUTIONS **Romy Walker:** Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data Curation, Writing - Original Draft, Writing - Review & Editing, Visualization, Funding acquisition. **Jihoon E. Joo:** Software, Formal analysis, Investigation, Writing - Review & Editing, Visualization. **Khalid Mahmood:** Software, Investigation, Writing - Review & Editing. **Mark Clendenning:** Investigation, Writing - Review & Editing. **Julia Como:** Investigation, Writing - Review & Editing. **Susan G. Preston:** Investigation, Writing - Review & Editing. **Sharelle Joseland:** Resources, Writing - Review & Editing. **Bernard J. Pope:** Software, Writing - Review & Editing, Funding acquisition. **Ana B. D. Medeiros:** Formal analysis, Resources, Writing - Review & Editing. **Brenely V. Murillo:** Resources, Writing - Review & Editing. **Nicholas Pachter:** Resources, Writing - Review & Editing. **Kevin Sweet:** Resources, Writing - Review & Editing. **Allan D. Spigelman:** Resources, Writing - Review & Editing. **Alexandra Groves:** Resources, Writing - Review & Editing. **Margaret Gleeson:** Resources, Writing - Review & Editing. **Krzysztof Bernatowicz:** Resources, Writing - Review & Editing. **Nicola Poplawski:** Resources, Writing - Review & Editing. **Lesley Andrews:** Resources, Writing - Review & Editing. **Emma Healey:** Resources, Writing - Review & Editing. **Steven Gallinger:** Resources, Writing - Review & Editing. **Robert C. Grant:** Resources, Writing - Review & Editing. **Aung K. Win:** Resources, Writing - Review & Editing, Funding acquisition. **John L. Hopper:** Resources, Writing - Review & Editing, Funding acquisition. **Mark A. Jenkins:** Resources, Writing - Review & Editing, Funding acquisition. **Giovana T. Torrezan:** Conceptualization, Resources, Data Curation, Writing - Review & Editing. **Christophe Rosty:** Data Curation, Writing - Review & Editing. **Finlay A. Macrae:** Resources, Writing - Review & Editing. **Ingrid M. Winship:** Resources, Writing - Review & Editing. **Daniel D. Buchanan:** Conceptualization, Methodology, Validation, Data Curation, Investigation, Resources, Writing - Review & Editing, Supervision, Project administration, Funding acquisition. **Peter Georgeson:** Conceptualization, Methodology, Software, Validation, Data Curation, Formal analysis, Investigation, Writing - Review & Editing, Visualization, Supervision, Funding acquisition. ## DECLARATION OF INTEREST STATEMENT Robert C. Grant received a scholarship from Pfizer and provided consulting or advisory roles for Astrazeneca, Tempus, Eisai, Incyte, Knight Therapeutics, Guardant Health, and Ipsen. All other authors have no relevant financial or non-financial interests to disclose. ## ACKNOWLEDGEMENTS The authors thank members of the Colorectal Oncogenomics Group and members from the Genomic Medicine and Family Cancer Clinic for their support of this manuscript. We thank the participants and staff from the ANGELS study, GCPS study, and from the Australasian and Ontario sites of the CCFR, in particular Maggie Angelakos, Samantha Fox, Cary Greenberg and Allyson Templeton for their support of this manuscript. The CCFR graciously thanks the generous contributions of their study participants, dedication of study staff, and the financial support from the U.S. National Cancer Institute, without which this important registry would not exist. The authors thank Melbourne Bioinformatics and the Australian Genome Research Facility for their collaboration on this project. ## Footnotes * # Joint senior authors * Received August 8, 2024. * Revision received August 8, 2024. * Accepted August 9, 2024. * © 2024, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## REFERENCES 1. [1].Al-Tassan N, Chmiel NH, Maynard J, Fleming N, Livingston AL, Williams GT, et al. Inherited variants of MYH associated with somatic G:C→T:A mutations in colorectal tumors. Nat Genet 2002;30:227–32. doi:10.1038/ng828. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng828&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=11818965&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F09%2F2024.08.08.24311713.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000173708700027&link_type=ISI) 2. [2].Weren RDA, Ligtenberg MJL, Kets CM, de Voer RM, Verwiel ETP, Spruijt L, et al. A germline homozygous mutation in the base-excision repair gene NTHL1 causes adenomatous polyposis and colorectal cancer. Nat Genet 2015;47:668–71. doi:10.1038/ng.3287. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3287&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25938944&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F09%2F2024.08.08.24311713.atom) 3. [3].Win AK, Reece JC, Dowty JG, Buchanan DD, Clendenning M, Rosty C, et al. Risk of extracolonic cancers for people with biallelic and monoallelic mutations in MUTYH. Int J Cancer 2016;139:1557–63. doi:10.1002/ijc.30197. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/ijc.30197&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F09%2F2024.08.08.24311713.atom) 4. [4].Grolleman JE, Díaz-Gay M, Franch-Expósito S, Castellví-Bel S, de Voer RM. Somatic mutational signatures in polyposis and colorectal cancer. Mol Aspects Med 2019;69:62–72. doi:10.1016/j.mam.2019.05.002. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.mam.2019.05.002&link_type=DOI) 5. [5].Georgeson P, Pope BJ, Rosty C, Clendenning M, Mahmood K, Joo JE, et al. Evaluating the utility of tumour mutational signatures for identifying hereditary colorectal cancer and polyposis syndrome carriers. Gut 2021;70:2138–49. doi:10.1136/gutjnl-2019-320462. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ3V0am5sIjtzOjU6InJlc2lkIjtzOjEwOiI3MC8xMS8yMTM4IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDgvMDkvMjAyNC4wOC4wOC4yNDMxMTcxMy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 6. [6].Georgeson P, Harrison TA, Pope BJ, Zaidi SH, Qu C, Steinfelder RS, et al. Identifying colorectal cancer caused by biallelic MUTYH pathogenic variants using tumor mutational signatures. Nat Commun 2022;13:3254. doi:10.1038/s41467-022-30916-1. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-022-30916-1&link_type=DOI) 7. [7].Grolleman JE, de Voer RM, Elsayed FA, Nielsen M, Weren RDA, Palles C, et al. Mutational Signature Analysis Reveals NTHL1 Deficiency to Cause a Multi- tumor Phenotype. Cancer Cell 2019;35:256–266.e5. doi:10.1016/j.ccell.2018.12.011. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ccell.2018.12.011&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F09%2F2024.08.08.24311713.atom) 8. [8].Seifert BA, McGlaughon JL, Jackson SA, Ritter DI, Roberts ME, Schmidt RJ, et al. Determining the clinical validity of hereditary colorectal cancer and polyposis susceptibility genes using the Clinical Genome Resource Clinical Validity Framework. Genet Med Off J Am Coll Med Genet 2019;21:1507–16. doi:10.1038/s41436-018-0373-1. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41436-018-0373-1&link_type=DOI) 9. [9].Saunders CT, Wong WSW, Swamy S, Becq J, Murray LJ, Cheetham RK. Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs. Bioinformatics 2012;28:1811–7. doi:10.1093/bioinformatics/bts271. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/bts271&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22581179&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F09%2F2024.08.08.24311713.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000306136100002&link_type=ISI) 10. [10].Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol 2013;31:213–9. doi:10.1038/nbt.2514. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nbt.2514&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23396013&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F09%2F2024.08.08.24311713.atom) 11. [11].Huang X, Wojtowicz D, Przytycka TM. Detecting presence of mutational signatures in cancer with confidence. Bioinformatics 2018;34:330–7. doi:10.1093/bioinformatics/btx604. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btx604&link_type=DOI) 12. [12].Crisafulli G, Sartore-Bianchi A, Lazzari L, Pietrantonio F, Amatu A, Macagno M, et al. Temozolomide Treatment Alters Mismatch Repair and Boosts Mutational Burden in Tumor and Blood of Colorectal Cancer Patients. Cancer Discov 2022;12:1656–75. doi:10.1158/2159-8290.CD-21-1434. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1158/2159-8290.CD-21-1434&link_type=DOI) 13. [13].Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res 2019;47:D941–7. doi:10.1093/nar/gky1015. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gky1015&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30371878&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F09%2F2024.08.08.24311713.atom) 14. [14].Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, et al. The repertoire of mutational signatures in human cancer. Nature 2020;578:94–101. doi:10.1038/s41586-020-1943-3. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-1943-3&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32025018&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F09%2F2024.08.08.24311713.atom) 15. [15].Viel A, Bruselles A, Meccia E, Fornasarig M, Quaia M, Canzonieri V, et al. A Specific Mutational Signature Associated with DNA 8-Oxoguanine Persistence in MUTYH-defective Colorectal Cancer. EBioMedicine 2017;20:39–49. doi:10.1016/j.ebiom.2017.04.022. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ebiom.2017.04.022&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28551381&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F09%2F2024.08.08.24311713.atom) 16. [16].Everall A, Tapinos A, Hawari A, Cornish A, Sud A, Chubb D, et al. Comprehensive repertoire of the chromosomal alteration and mutational signatures across 16 cancer types from 10,983 cancer patients 2023:2023.06.07.23290970. doi:10.1101/2023.06.07.23290970. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMy4wNi4wNy4yMzI5MDk3MHYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDgvMDkvMjAyNC4wOC4wOC4yNDMxMTcxMy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 17. [17].Gurjao C, Zhong R, Haruki K, Li YY, Spurr LF, Lee-Six H, et al. Discovery and Features of an Alkylating Signature in Colorectal Cancer. Cancer Discov 2021;11:2446–55. doi:10.1158/2159-8290.CD-20-1656. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiY2FuZGlzYyI7czo1OiJyZXNpZCI7czoxMDoiMTEvMTAvMjQ0NiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzA4LzA5LzIwMjQuMDguMDguMjQzMTE3MTMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 18. [18].Pleguezuelos-Manzano C, Puschhof J, Rosendahl Huber A, van Hoeck A, Wood HM, Nomburg J, et al. Mutational signature in colorectal cancer caused by genotoxic pks+ E. coli. Nature 2020;580:269–73. doi:10.1038/s41586-020-2080-8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2080-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32106218&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F09%2F2024.08.08.24311713.atom) 19. [19].Walker R, Mahmood K, Joo JE, Clendenning M, Georgeson P, Como J, et al. A tumor focused approach to resolving the etiology of DNA mismatch repair deficient tumors classified as suspected Lynch syndrome. J Transl Med 2023;21:282. doi:10.1186/s12967-023-04143-1. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12967-023-04143-1&link_type=DOI) 20. [20].Van Rossum G, Drake FL. Python 3 Reference Manual. Scotts Valley, CA: CreateSpace; 2009. 21. [21].Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, et al. Array programming with NumPy. Nature 2020;585:357–62. doi:10.1038/s41586-020-2649-2. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2649-2&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32939066&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F09%2F2024.08.08.24311713.atom) 22. [22].Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res 2011;12:2825–30. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cpc.2010.04.018&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23755062&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F09%2F2024.08.08.24311713.atom) 23. [23].R Core Team. R: a language and environment for statistical computing 2020. [https://www.R-project.org/](https://www.R-project.org/) (accessed October 5, 2022). 24. [24].Allaire JJ. RStudio: Integrated Development Environment for R n.d. 25. [25].Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York; 2016. 26. [26].Wilke CO. cowplot: Streamlined Plot Theme and Plot Annotations for “ggplot2.” R package version 113 2024. 27. [27].Wickham H, François R, Henry L, Müller K, Vaughan D, Software P, et al. dplyr: A Grammar of Data Manipulation 2023. 28. [28].Weren RD, Ligtenberg MJ, Geurts van Kessel A, De Voer RM, Hoogerbrugge N, Kuiper RP. NTHL1 and MUTYH polyposis syndromes: two sides of the same coin? J Pathol 2018;244:135–42. doi:10.1002/path.5002. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/path.5002&link_type=DOI) 29. [29].Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 2018;46:D1062–7. doi:10.1093/nar/gkx1153. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gkx1153&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29165669&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F09%2F2024.08.08.24311713.atom)