Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

LIRIC predicts Hepatocellular Carcinoma risk in the diverse U.S. population using routine clinical data

Kai Jia, Bowen Gu, Pasapol Saowakon, Steven Kundrot, Matvey B. Palchuk, Jeff Warnick, Irving D. Kaplan, Martin Rinard, Limor Appelbaum
doi: https://doi.org/10.1101/2024.05.28.24307949
Kai Jia
1Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bowen Gu
4Dana-Farber Cancer Institute, Boston, Massachusetts, USA
5Brigham and Women’s Hospital, Boston, Massachusetts, USA
6Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pasapol Saowakon
1Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Steven Kundrot
2TriNetX, LLC, Cambridge, Massachusetts, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Matvey B. Palchuk
2TriNetX, LLC, Cambridge, Massachusetts, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeff Warnick
2TriNetX, LLC, Cambridge, Massachusetts, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Irving D. Kaplan
3Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Martin Rinard
1Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Limor Appelbaum
3Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: lappelb1{at}bidmc.harvard.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Background and Aims Hepatocellular Carcinoma (HCC) is often diagnosed late, limiting curative treatment options. Conversely, early detection in cirrhotic patients through screening offers high cure rates but is underutilized and misses cases occurring in individuals without cirrhosis. We aimed to build, validate, and simulate the deployment of models for HCC risk stratification using routinely collected Electronic Health Record (EHR) data from a geographically and racially diverse U.S. population.

Methods We developed Logistic Regression (LiricLR) and Neural Network (LiricNN) models for the general (GP) and cirrhosis populations utilizing EHR data from 46,79 HCC cases and 1,128,202 controls aged 40-100 years. Data was sourced from 64 Health Care Organizations (HCOs) from a federated network, spanning academic medical centers, community hospitals, and outpatient clinics nationwide. We evaluated model performance using AUC, calibration plots, and Geometric Mean of Overestimation (GMOE), the geometric mean of ratios of predicted to actual risks. External validation involved HCO location, race, and temporal factors. Simulated deployment assessed sensitivity, specificity, Positive Predictive Value, Number Needed to Screen for each risk threshold.

Results LiricLR and LiricNN (GP) achieved test set AUCs of AUC=0.8968 (95% CI: 0.8925, 0.9010) and AUC=0.9254 (95% CI: 0.9218, 0.9289), respectively, leveraging 46 established (cirrhosis, hepatitis, diabetes) and novel (frequency of clinical encounters, platelet, albumin, aminotransferase values) features. Average external validation AUCs of LiricNN were 0.9274 (95% CI: 0.9239, 0.9308) for locations and 0.9284 (95% CI: 0.9247, 0.9320) for races. Average GMOEs were 0.887 (95% CI: 0.862-0.911). Simulated model deployment of LiricNN provides performance metrics across multiple risk thresholds.

Conclusions Liric models utilize routine EHR data to accurately predict risk of HCC development. Their scalability, generalizability, and interpretability set the stage for future clinical deployment and the design of more effective screening programs.

Lay Summary Hepatocellular Carcinoma (HCC), the most common liver cancer, is often diagnosed in late stages, limiting treatment options. Early detection through screening is essential for effective intervention and potential cure. However, current screening mostly targets patients with liver cirrhosis, many of whom do not get screened, while missing others who could develop HCC even without cirrhosis.

To improve screening, we created and tested Liric (LIver cancer RIsk Computation) models. These models use routine medical records from across the country to identify people at high risk of developing HCC.

Liric models have several benefits. Firstly, they can increase awareness among primary care physicians (PCPs) nationwide, improving the utilization of HCC screening. This is particularly crucial in areas with socio-demographic disparities, where access to specialist physicians may be limited. Additionally, Liric models can identify patients who would be missed by current screening guidelines, ensuring a more comprehensive approach to HCC detection.

Liric can be integrated into EHR systems to automatically generate a risk score from routinely collected patient data. This risk score can provide valuable information to physicians and caregivers, helping them make informed decisions about the need for HCC screening and can be used to develop cost-effective screening programs by identifying populations in which screening is effective.

Figure
  • Download figure
  • Open in new tab

Highlights

  • Screening detects HCC early but is underutilized and misses cases without cirrhosis

  • We developed, validated, and simulated deployment of Liric to identify individuals at high-risk for HCC

  • Liric uses routinely collected clinical and lab data from a diverse US population

  • Liric accurately predicts risk of HCC 6-36 months before it occurs

  • Liric can assist PCPs in identifying individuals most in need of screening

Impacts and implications Effective screening for hepatocellular carcinoma (HCC) is vital to achieve early detection and improved cure rates. However, the existing screening approach primarily targets patients with liver cirrhosis, and is both underutilized and fails to identify those without underlying cirrhosis.

Implementation of Liric models has the potential to enhance nationwide awareness among primary care physicians (PCPs), and improve screening utilization for hepatocellular carcinoma (HCC), particularly in regions characterized by socio-demographic disparities. Furthermore, these models can help identify patients who are currently overlooked by existing screening guidelines and aid in the development of new, more effective guidelines.

Integration of Liric models into EHR systems via a federated network would enable automatic generation of risk scores using unfiltered patient data. This approach could more accurately identify at-risk patients, providing valuable information to caregivers for HCC screening.

Competing Interest Statement

KJ and MR are not aware of any payments or services, paid to themselves or MIT, that could be perceived to influence the submitted work. IK and LA are not aware of any payments or services, paid to themselves or BIDMC, that could be perceived to influence the submitted work. During the time the research was performed MR received consulting fees and payment for expert testimony for Comcast, Google, Motorola, Qualcomm, and IBM, is a member of the scientific advisory board and owns stock at Vali Cyber, and acknowledges support from Boeing, DARPA, and the NSF for salary and research support including meeting attendance and travel. MR has the following patents: United States Patent 10,539,419. Method and apparatus for reducing sensor power dissipation. Phillip Stanley- Marbell, Martin Rinard. United States Patent 10,135,471. System, method, and apparatus for reducing power dissipation of sensor data on bit-serial communication interfaces. Phillip Stanley-Marbell, Martin Rinard. United States Patent 9,189,254. Translating text to, merging, and optimizing graphical user interface tasks. Nathaniel Kushman, Regina Barzilay, Satchuthananthavale Branavan, Dina Katabi, Martin Rinard. United States Patent 8,839,221. Automatic acquisition and installation of software upgrades for collections of virtual machines. Constantine Sapuntzakis, Martin Rinard, Gautam Kachroo. United States Patent 8,788,884. Automatic correction of program logic. Jeff Perkins, Stylianos Sidiroglou, Martin Rinard, Eric Lahtinen, Paolo Piselli, Basil Krikeles, Timothy Anderson, Greg Sullivan. United States Patent 7,260,746. Specification based detection and repair of errors in data structures. Brian Demsky, Martin Rinard. SK, MP, and JW are full-time employees of TriNetX, LLC. SK and JW own TriNetX stock. The remaining authors declare no competing interests.

Funding Statement

This work was supported by the Coverys Community Healthcare Foundation and TriNetX. TriNetX contributed resources including secured laptop computers, access to the TriNetX EHR database, and clinical, technical, legal, and administrative assistance from the TriNetX team of clinical informaticists, engineers, and technical staff. SK, MP, JW are employees of TriNetX and were involved in study design and data acquisition. The Coverys Community Healthcare Foundation was not involved in study design, collection, analysis, interpretation of data, writing of the report, or in the decision to submit the article for publication.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

All EHR data were obtained through the federated global health research network, TriNetX, and de-identified by TriNetX. We accessed the data under a no-cost collaboration agreement. Based on a determination by the Western IRB, studies using TriNetX data are not considered to be human subject research, and are therefore exempt from IRB review.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • ↵* Co-senior authors

  • (jiakai{at}mit.edu)

  • (bogu{at}bwh.harvard.edu)

  • (zoomswk{at}mit.edu)

  • (steve.kundrot{at}trinetx.com)

  • (matvey.palchuk{at}trinetx.com)

  • (jeff.warnick{at}trinetx.com)

  • (ikaplan{at}bidmc.harvard.edu)

  • (rinard{at}csail.mit.edu)

  • Conflict of interest statement KJ and MR are not aware of any payments or services, paid to themselves or MIT, that could be perceived to influence the submitted work. IK and LA are not aware of any payments or services, paid to themselves or BIDMC, that could be perceived to influence the submitted work. During the time the research was performed MR received consulting fees and payment for expert testimony for Comcast, Google, Motorola, Qualcomm, and IBM, is a member of the scientific advisory board and owns stock at Vali Cyber, and acknowledges support from Boeing, DARPA, and the NSF for salary and research support including meeting attendance and travel. MR has the following patents: United States Patent 10,539,419. Method and apparatus for reducing sensor power dissipation. Phillip Stanley-Marbell, Martin Rinard. United States Patent 10,135,471. System, method, and apparatus for reducing power dissipation of sensor data on bit-serial communication interfaces. Phillip Stanley-Marbell, Martin Rinard. United States Patent 9,189,254. Translating text to, merging, and optimizing graphical user interface tasks. Nathaniel Kushman, Regina Barzilay, Satchuthananthavale Branavan, Dina Katabi, Martin Rinard. United States Patent 8,839,221. Automatic acquisition and installation of software upgrades for collections of virtual machines. Constantine Sapuntzakis, Martin Rinard, Gautam Kachroo. United States Patent 8,788,884. Automatic correction of program logic. Jeff Perkins, Stylianos Sidiroglou, Martin Rinard, Eric Lahtinen, Paolo Piselli, Basil Krikeles, Timothy Anderson, Greg Sullivan. United States Patent 7,260,746. Specification based detection and repair of errors in data structures. Brian Demsky, Martin Rinard. SK, MP, and JW are full-time employees of TriNetX, LLC. SK and JW own TriNetX stock. The remaining authors declare no competing interests.

  • Financial support statement This work was supported by the Coverys Community Healthcare Foundation and TriNetX. TriNetX contributed resources including secured laptop computers, access to the TriNetX EHR database, and clinical, technical, legal, and administrative assistance from the TriNetX team of clinical informaticists, engineers, and technical staff. SK, MP, JW are employees of TriNetX and were involved in study design and data acquisition. The Coverys Community Healthcare Foundation was not involved in study design, collection, analysis, interpretation of data, writing of the report, or in the decision to submit the article for publication.

Data Availability

The de-identified data in TriNetX federated network database can only be accessed by researchers that are either part of the network or have a collaboration agreement with TriNetX. As stated in the manuscript, we accessed data as part of a no-cost collaboration agreement between BIDMC, MIT, and TriNetX.

  • Abbreviations

    EHR
    electronic health record
    HCC
    Hepatocellular Carcinoma
    GP
    General Population
    NAFLD
    non-alcoholic fatty liver disease
    HCO
    health care organization
    LR
    logistic regression
    NN
    neural network
    AI
    Artificial Intelligence
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
    Back to top
    PreviousNext
    Posted May 29, 2024.
    Download PDF
    Data/Code
    Email

    Thank you for your interest in spreading the word about medRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    LIRIC predicts Hepatocellular Carcinoma risk in the diverse U.S. population using routine clinical data
    (Your Name) has forwarded a page to you from medRxiv
    (Your Name) thought you would like to see this page from the medRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    LIRIC predicts Hepatocellular Carcinoma risk in the diverse U.S. population using routine clinical data
    Kai Jia, Bowen Gu, Pasapol Saowakon, Steven Kundrot, Matvey B. Palchuk, Jeff Warnick, Irving D. Kaplan, Martin Rinard, Limor Appelbaum
    medRxiv 2024.05.28.24307949; doi: https://doi.org/10.1101/2024.05.28.24307949
    Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    LIRIC predicts Hepatocellular Carcinoma risk in the diverse U.S. population using routine clinical data
    Kai Jia, Bowen Gu, Pasapol Saowakon, Steven Kundrot, Matvey B. Palchuk, Jeff Warnick, Irving D. Kaplan, Martin Rinard, Limor Appelbaum
    medRxiv 2024.05.28.24307949; doi: https://doi.org/10.1101/2024.05.28.24307949

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Oncology
    Subject Areas
    All Articles
    • Addiction Medicine (349)
    • Allergy and Immunology (668)
    • Allergy and Immunology (668)
    • Anesthesia (181)
    • Cardiovascular Medicine (2648)
    • Dentistry and Oral Medicine (316)
    • Dermatology (223)
    • Emergency Medicine (399)
    • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
    • Epidemiology (12228)
    • Forensic Medicine (10)
    • Gastroenterology (759)
    • Genetic and Genomic Medicine (4103)
    • Geriatric Medicine (387)
    • Health Economics (680)
    • Health Informatics (2657)
    • Health Policy (1005)
    • Health Systems and Quality Improvement (985)
    • Hematology (363)
    • HIV/AIDS (851)
    • Infectious Diseases (except HIV/AIDS) (13695)
    • Intensive Care and Critical Care Medicine (797)
    • Medical Education (399)
    • Medical Ethics (109)
    • Nephrology (436)
    • Neurology (3882)
    • Nursing (209)
    • Nutrition (577)
    • Obstetrics and Gynecology (739)
    • Occupational and Environmental Health (695)
    • Oncology (2030)
    • Ophthalmology (585)
    • Orthopedics (240)
    • Otolaryngology (306)
    • Pain Medicine (250)
    • Palliative Medicine (75)
    • Pathology (473)
    • Pediatrics (1115)
    • Pharmacology and Therapeutics (466)
    • Primary Care Research (452)
    • Psychiatry and Clinical Psychology (3432)
    • Public and Global Health (6527)
    • Radiology and Imaging (1403)
    • Rehabilitation Medicine and Physical Therapy (814)
    • Respiratory Medicine (871)
    • Rheumatology (409)
    • Sexual and Reproductive Health (410)
    • Sports Medicine (342)
    • Surgery (448)
    • Toxicology (53)
    • Transplantation (185)
    • Urology (165)