Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Large-scale integration of omics and electronic health records to identify potential risk protein biomarkers and therapeutic drugs for cancer prevention and intervention

View ORCID ProfileQing Li, Qingyuan Song, Zhishan Chen, Jungyoon Choi, Victor Moreno, View ORCID ProfileJie Ping, Wanqing Wen, Chao Li, Xiang Shu, Jun Yan, Xiao-ou Shu, Qiuyin Cai, Jirong Long, Jeroen R Huyghe, Rish Pai, Stephen B Gruber, Graham Casey, Xusheng Wang, Adetunji T. Toriola, Li Li, Bhuminder Singh, Ken S Lau, Li Zhou, View ORCID ProfileChong Wu, Ulrike Peters, Wei Zheng, Quan Long, Zhijun Yin, Xingyi Guo
doi: https://doi.org/10.1101/2024.05.29.24308170
Qing Li
1Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
2Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Qing Li
Qingyuan Song
3Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
4Department of Department of Computer Science, Vanderbilt University, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zhishan Chen
1Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jungyoon Choi
5Division of Oncology and Hematology, Department of Internal Medicine, Korea University Ansan Hospital, Korea University College of Medicine, Ansan 15355, Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Victor Moreno
6Oncology Data Analytics Program, Catalan Institute of Oncology (ICO), L’Hospitalet de Llobregat, Barcelona, Spain
7Colorectal Cancer Group, ONCOBELL Program, Institut de Recerca Biomedica de Bellvitge (IDIBELL), L’Hospitalet de Llobregat, Barcelona, Spain
8Department of Clinical Sciences, Faculty of Medicine and health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona (UB), L’Hospitalet de Llobregat, Barcelona, Spain
9Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jie Ping
1Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jie Ping
Wanqing Wen
1Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chao Li
1Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xiang Shu
10Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jun Yan
11Physiology and Pharmacology, University of Calgary, Calgary, Alberta, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xiao-ou Shu
1Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Qiuyin Cai
1Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jirong Long
1Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeroen R Huyghe
12Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rish Pai
13Mayo Clinic Arizona, Scottsdale, AZ, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stephen B Gruber
14Department of Preventive Medicine & USC Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Graham Casey
15Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xusheng Wang
16Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Adetunji T. Toriola
17Division of Public Health Sciences, Department of Surgery, Washington University School of Medicine and Siteman Cancer Center, St. Louis, MO, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Li Li
18Department of family medicine, School of Medicine, University of Virginia, Charlottesville, VA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bhuminder Singh
19Epithelial Biology Center and Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ken S Lau
19Epithelial Biology Center and Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Li Zhou
20Harvard Medical School, Boston, Massachusetts; Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chong Wu
21Department of Biostatistics, the University of Texas MD Anderson, Houston, Texas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Chong Wu
Ulrike Peters
12Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
22Department of Epidemiology, University of Washington School of Public Health, Seattle, WA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wei Zheng
1Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Quan Long
2Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
23Department of Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
24The Mathison Centre for Mental Health Research & Education, Hotchkiss Brain Institute, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
25Alberta Children’s Hospital Research Institute, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
26Department of Mathematics and Statistics, Faculty of Science, University of Calgary, Calgary, Alberta, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: xingyi.guo{at}vumc.org zhijun.yin.1{at}vumc.org quan.long{at}ucalgary.ca
Zhijun Yin
3Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
4Department of Department of Computer Science, Vanderbilt University, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: xingyi.guo{at}vumc.org zhijun.yin.1{at}vumc.org quan.long{at}ucalgary.ca
Xingyi Guo
1Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
3Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: xingyi.guo{at}vumc.org zhijun.yin.1{at}vumc.org quan.long{at}ucalgary.ca
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Identifying risk protein targets and their therapeutic drugs is crucial for effective cancer prevention. Here, we conduct integrative and fine-mapping analyses of large genome-wide association studies data for breast, colorectal, lung, ovarian, pancreatic, and prostate cancers, and characterize 710 lead variants independently associated with cancer risk. Through mapping protein quantitative trait loci (pQTL) for these variants using plasma proteomics data from over 75,000 participants, we identify 365 proteins associated with cancer risk. Subsequent colocalization analysis identifies 101 proteins, including 74 not reported in previous studies. We further characterize 36 potential druggable proteins for cancers or other disease indications. Analyzing >3.5 million electronic health records, we uncover five drugs (Haloperidol, Trazodone, Tranexamic Acid, Haloperidol, and Captopril) associated with increased cancer risk and two drugs (Caffeine and Acetazolamide) linked to reduced colorectal cancer risk. This study offers novel insights into therapeutic drugs targeting risk proteins for cancer prevention and intervention.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported by the US National Institutes of Health grant 1R37CA227130-01A1 and R01CA269589-01A1 to X.G. The data analyses were conducted using the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University. New Frontiers in Research Fund (NFRFE-2018-00748) and NSERC Discovery Grant (RGPIN-2024-04679) to Q.L. The computational infrastructure was partly supported by a Canada Foundation for Innovation JELF grant (36605) to Q.L.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study used ONLY openly available human data. The summary statistics of GWAS data for the six common cancers can be accessed through their publications, including Breast cancer(PMID: 32424353), Ovarian cancer(PMID: 28346442), Prostate cancer(PMID: 29892016), Colorectal Cancer(PMID: 30510241), Lung cancer(PMID: 28604730), Pancreatic cancer(PMID: 31917448). Metadata and pQTL summary statistics from UKB-PPP can be downloaded from Synapse: Project SynID: syn51364943; pQTL from ARIC46 and deCODE genetics47 can be accessed through previous publications (PMID: 34857953 and PMID: 35501419). Functional genomic data includes: TCGA and CPTAC differential expression results accessible through https://ualcan.path.uab.edu/index.html; 4DGenome: https://4dgenome.research.chop.edu/; Depmap:https://depmap.org/portal/; FANTOM5: http://fantom.gsc.riken.jp/5/. HaploReg: https://pubs.broadinstitute.org/mammals/haploreg/. GTEx: https://gtexportal.org/home/. GENCODE (v26.GRCh38) was downloaded from https://www.gencodegenes.org/human/release_26.html. National Cancer Institute can be accessed through https://www.cancer.gov/about-cancer/treatment/drugs; CGC can be accessed via COSMIC website: https://cancer.sanger.ac.uk/census. Drugs and compounds data can be downloaded from the following URLs: ChEMBL: https://www.ebi.ac.uk/chembl/; Therapeutic Target Database: https://db.idrblab.net/ttd/; Open Targets: https://www.opentargets.org/; DrugBank: https://go.drugbank.com/. The EHR data, containing de-identified clinical information, can be accessed through the VUMC SD database. Data is available through restricted access for approved studies and researchers who agree to specific conditions of use.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • ↵† Author names shared co-first authorship

Data availability

Supplementary Table 1 provides the download information for the summary statistics of GWAS data for the six common cancers, including breast, ovary, prostate, colorectum, lung, and pancreas. Metadata and pQTL summary statistics from UKB-PPP can be downloaded from Synapse: Project SynID: syn51364943; pQTL from ARIC46 and deCODE genetics47 can be accessed through previous publications (PMID: 34857953 and PMID: 35501419). Functional genomic data includes: TCGA and CPTAC differential expression results accessible through https://ualcan.path.uab.edu/index.html; 4DGenome: https://4dgenome.research.chop.edu/; Depmap : https://depmap.org/portal/; FANTOM5: http://fantom.gsc.riken.jp/5/. HaploReg: https://pubs.broadinstitute.org/mammals/haploreg/. GTEx: https://gtexportal.org/home/. GENCODE (v26.GRCh38) was downloaded from https://www.gencodegenes.org/human/release_26.html. National Cancer Institute can be accessed through https://www.cancer.gov/about-cancer/treatment/drugs; CGC can be accessed accessed via COSMIC website: https://cancer.sanger.ac.uk/census. Drugs and compounds data can be downloaded from the following URLs: ChEMBL: https://www.ebi.ac.uk/chembl/; Therapeutic Target Database: https://db.idrblab.net/ttd/; Open Targets: https://www.opentargets.org/; DrugBank: https://go.drugbank.com/. The EHR data, containing de-identified clinical information, can be accessed through the VUMC SD database. Data is available through restricted access for approved studies and researchers who agree to specific conditions of use.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted May 31, 2024.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Large-scale integration of omics and electronic health records to identify potential risk protein biomarkers and therapeutic drugs for cancer prevention and intervention
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Large-scale integration of omics and electronic health records to identify potential risk protein biomarkers and therapeutic drugs for cancer prevention and intervention
Qing Li, Qingyuan Song, Zhishan Chen, Jungyoon Choi, Victor Moreno, Jie Ping, Wanqing Wen, Chao Li, Xiang Shu, Jun Yan, Xiao-ou Shu, Qiuyin Cai, Jirong Long, Jeroen R Huyghe, Rish Pai, Stephen B Gruber, Graham Casey, Xusheng Wang, Adetunji T. Toriola, Li Li, Bhuminder Singh, Ken S Lau, Li Zhou, Chong Wu, Ulrike Peters, Wei Zheng, Quan Long, Zhijun Yin, Xingyi Guo
medRxiv 2024.05.29.24308170; doi: https://doi.org/10.1101/2024.05.29.24308170
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Large-scale integration of omics and electronic health records to identify potential risk protein biomarkers and therapeutic drugs for cancer prevention and intervention
Qing Li, Qingyuan Song, Zhishan Chen, Jungyoon Choi, Victor Moreno, Jie Ping, Wanqing Wen, Chao Li, Xiang Shu, Jun Yan, Xiao-ou Shu, Qiuyin Cai, Jirong Long, Jeroen R Huyghe, Rish Pai, Stephen B Gruber, Graham Casey, Xusheng Wang, Adetunji T. Toriola, Li Li, Bhuminder Singh, Ken S Lau, Li Zhou, Chong Wu, Ulrike Peters, Wei Zheng, Quan Long, Zhijun Yin, Xingyi Guo
medRxiv 2024.05.29.24308170; doi: https://doi.org/10.1101/2024.05.29.24308170

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)