Abstract
Alternative polyadenylation (APA) plays an important role in cancer initiation and progression; however, current genome- and transcriptome-wide association studies (GWAS and TWAS, respectively) mostly ignore APA when identifying putative cancer susceptibility genes. Here, we performed a pan-cancer 3′untranslated region (UTR) APA TWAS (3′aTWAS) by integrating 80 well-powered (n>50,000) GWAS datasets across 23 major cancer types with APA quantification from 17,330 RNA sequencing samples across 49 tissue types and 949 individuals. We found that genetic variants associated with APA represent around 24.4% of cancer GWAS variants and are more likely to be causal variants explaining a large portion of cancer heritability. We further identified 413 significant APA-linked cancer susceptibility genes. Of these, 77.4% have been overlooked by traditional expression- and splicing-studies, given that APA may regulate translation, protein localization, and protein–protein interactions independent of the expression level of the genes or splicing isoforms. As proof of principle validation, modulation of four novel APA-linked breast-cancer susceptibility genes significantly altered cancer cell proliferation. Our study highlights the significant role of APA in discovering new cancer susceptibility genes and provides a strong foundational framework for enhancing our understanding of the etiology underlying human cancers.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported by the National Key Research and Development Program of China (no. 2022YFA1302800) to L.D., National Natural Science Foundation of China (no. 32100533) and Open grant funds from Shenzhen Bay Laboratory (no. SZBL2021080601001) to L.L. National Natural Science Foundation of China (no. 32270779) to L.D..
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Raw whole transcriptome and genome sequencing data from the Genotype-Tissue Expression (GTEx) project are available via the database of Genotypes and Phenotypes (dbGaP), under the accession number: phs000424.v8.p2. All processed GTEx data are available via the GTEx portal((http://gtexportal.org/). GWAS summary statistics are from NHGRI-EBI GWAS catalog (https://www.ebi.ac.uk/gwas/), UK Biobank GWAS, http://www.nealelab.is/uk-biobank/), Finn Gen(https://www.finngen.fi/en) and JENGER(http://jenger.riken.jp). The details, including accession numbers, of GWAS summary statistics used in this study are listed in Supplementary Table 1.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
Raw whole transcriptome and genome sequencing data from the Genotype-Tissue Expression (GTEx) project are available via the database of Genotypes and Phenotypes (dbGaP), under the accession number: phs000424.v8.p260. All processed GTEx data are available via the GTEx portal((http://gtexportal.org/). GWAS summary statistics are from NHGRI-EBI GWAS catalog (https://www.ebi.ac.uk/gwas/), UK Biobank GWAS, http://www.nealelab.is/uk-biobank/), Finn Gen(https://www.finngen.fi/en) and JENGER(http://jenger.riken.jp). The details, including accession numbers, of GWAS summary statistics used in this study are listed in Supplementary Table 1. 1000 Genomes Project Reference for LDSC, https://data.broadinstitute.org/alkesgroup/LDSCORE/1000G_Phase3_plinkfiles.tgz; 1000 Genomes Project Reference with regression weights for LDSC, https://data.broadinstitute.org/alkesgroup/LDSCORE/1000G_Phase3_weights_hm3_no_MHC.tgz). All significant 3′aTWAS genes in cancer are available at Supplementary Table 6. The expression and splicing TWAS models for GTEx v8 are publicly available at PredictDB (https://predictdb.org/).