Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Rapid Identification and Phenotyping of Nonalcoholic Fatty Liver Disease Patients Using an Algorithmic Approach in Diverse, Urban Healthcare Systems

View ORCID ProfileAnna O. Basile, View ORCID ProfileAnurag Verma, Leigh Anne Tang, View ORCID ProfileMarina Serper, Andrew Scanga, Ava Farrell, Brittney Destin, View ORCID ProfileRotonya M. Carr, Anuli Anyanwu-Ofili, View ORCID ProfileGunaretnam Rajagopal, Abraham Krikhely, Marc Bessler, View ORCID ProfileMuredach P. Reilly, View ORCID ProfileMarylyn D. Ritchie, View ORCID ProfileNicholas P. Tatonetti, View ORCID ProfileJulia Wattacheril
doi: https://doi.org/10.1101/2021.04.27.21256139
Anna O. Basile
1Department of Biomedical Informatics, Columbia University NY, NY
2Department of Computational Biology, New York Genome Center NY, NY
PhD,MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Anna O. Basile
Anurag Verma
3Department of Medicine, Division of Translational Medicine and Human Genetics and Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Anurag Verma
Leigh Anne Tang
4Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
MS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marina Serper
5Division of Gastroenterology and Hepatology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
MD,MS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Marina Serper
Andrew Scanga
6Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ava Farrell
7Department of Surgery, Division of Metabolic and Bariatric Surgery, Columbia University Irving Medical Center NY, NY
LMSW
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brittney Destin
7Department of Surgery, Division of Metabolic and Bariatric Surgery, Columbia University Irving Medical Center NY, NY
BS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rotonya M. Carr
8Department of Medicine, Division of Gastroenterology, University of Washington, Seattle, WA
MD,FACP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Rotonya M. Carr
Anuli Anyanwu-Ofili
9Janssen Pharma R&D LLC, Spring House, PA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gunaretnam Rajagopal
9Janssen Pharma R&D LLC, Spring House, PA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Gunaretnam Rajagopal
Abraham Krikhely
7Department of Surgery, Division of Metabolic and Bariatric Surgery, Columbia University Irving Medical Center NY, NY
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marc Bessler
7Department of Surgery, Division of Metabolic and Bariatric Surgery, Columbia University Irving Medical Center NY, NY
MD,FACS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Muredach P. Reilly
10Irving Institute for Clinical and Translational Research, Columbia University, New York, NY 10032, USA
11Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NY, NY
MB,MS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Muredach P. Reilly
Marylyn D. Ritchie
12Department of Genetics and Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
PhD,MS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Marylyn D. Ritchie
Nicholas P. Tatonetti
1Department of Biomedical Informatics, Columbia University NY, NY
13Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA
14Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nicholas P. Tatonetti
  • For correspondence: jjw2151{at}cumc.columbia.edu nicholas.tatonetti{at}csmc.edu
Julia Wattacheril
15Division of Digestive and Liver Diseases, Department of Medicine, Center for Liver Disease and Transplantation, Columbia University Irving Medical Center, NY, NY
MD,MPH
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Julia Wattacheril
  • For correspondence: jjw2151{at}cumc.columbia.edu nicholas.tatonetti{at}csmc.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Objectives Nonalcoholic Fatty Liver Disease (NAFLD) is the most common global cause of chronic liver disease. Therapeutic interventions are rapidly advancing for its inflammatory phenotype, nonalcoholic steatohepatitis (NASH) at all stages of disease. Diagnosis codes alone fail to accurately recognize and stratify at-risk patients. Our work aims to rapidly identify NAFLD patients within large electronic health record (EHR) databases for automated stratification and targeted intervention based on clinically relevant phenotypes.

Methods We present a rule-based phenotyping algorithm for the rapid identification of NAFLD patients developed using EHRs from 6.4 million patients at Columbia University Irving Medical Center (CUIMC) and validated at two independent healthcare centers. The algorithm uses the Observational Medical Outcomes Partnership (OMOP) Common Data Model and queries multiple structured and unstructured data elements, including diagnosis codes, laboratory measurements, radiology and pathology modalities.

Results Our approach identified 16,006 CUIMC NAFLD patients, 10,753 (67%) of whom were previously unidentifiable by NAFLD diagnosis codes. Fibrosis scoring on patients without histology identified 943 subjects with scores indicative of advanced fibrosis (FIB-4, APRI, NAFLD–FS). The algorithm was validated at two independent healthcare systems, University of Pennsylvania Health System (UPHS) and Vanderbilt Medical Center (VUMC), where 20,779 and 19,575 NAFLD patients were identified, respectively. Clinical chart review identified a high positive predictive value (PPV) across all healthcare systems: 91% at CUIMC, 75% at UPHS, and 85% at VUMC, and a sensitivity of 79.6%.

Conclusions Our rule-based algorithm provides an accurate, automated approach for rapidly identifying, stratifying, and sub-phenotyping NAFLD patients within a large EHR system.

WHAT IS KNOWN

  • NAFLD is the leading form of chronic liver disease with a rising prevalence in the population.

  • NAFLD is often under-recognized in at-risk individuals, including within healthcare settings.

  • Current means of identification and stratification are complex and dependent on provider recognition of clinical risk factors.

WHAT IS NEW HERE

  • An accurate, validated rule-based algorithm for the high-throughput and rapid EHR identification of NAFLD patients.

  • Rapid discovery of a NAFLD cohort from diverse EHR systems comprising approximately 12.1 million patients.

  • Our algorithm has high performance (mean PPV=85%, sensitivity=79.6%) in NAFLD patient discovery.

  • The majority of algorithmically derived NAFLD patients were previously unidentified within healthcare systems.

  • Computational stratification of individuals with advanced fibrosis can be achieved rapidly.

Competing Interest Statement

JW has received research support from Janssen, Galectin, Intercept, Genfit, Shire, Conatus, Zydus, and has served on the advisory board for Astra Zeneca/MedImmune, AMRA. RMC has received research support from Intercept Pharmaceuticals and Merck, Inc. MS is funded by NIDDK R01DK132138, R01DK131547 and has an unrestricted grant from Grifols, SA. GR retired from Janssen Pharma R&D as Scientific Fellow and Head of Computational Sciences and is currently a Venture Partner in Samsara BioCapital, Palo Alto, CA. AK is a speaker and proctor for Intuitive, reviewer for surgical videos for Crowd Sourced Assessment of Technical Skills (CSATs), and a consultant for Johnson and Johnson and Surgical Specialties Corporation. AV, LT, AS, AF, BD, AA, GR, AK, MB, MPR, MDR, JD, and NPT have nothing to disclose. Patent for algorithm to Columbia University Trustees; 2021 The Trustees of Columbia University in the City of New York. The owner has no objection to reproduction of the work for academic non-commercial purposes, but otherwise reserves all copyright rights whatsoever. JW, NPT and AOB are co-inventors.

Funding Statement

Funding was provided by Janssen Research and Development in collaboration with Columbia University Irving Medical Center. The sponsor was involved in study concept and design. This publication was supported by the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant Number UL1TR001873. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Health (NIH).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This study was approved by the Columbia University Institutional Review Boards of Columbia University Irving Medical Center, University of Pennsylvania Health System and Vanderbilt University Medical Center.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

  • Financial Support: Funding was provided by Janssen Research and Development in collaboration with Columbia University Irving Medical Center. The sponsor was involved in study concept and design. This publication was supported by the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant Number UL1TR001873. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Health (NIH).

    NPT is supported by R35GM131905.

    RMC is supported by R01AA026302.

    MDR is supported by R01GM138597, UL1-TR-001878, R01HG010067, and R01HG012670.

  • Conflict of Interest and Disclosures: Patent for algorithm to Columbia University Trustees; © 2021 The Trustees of Columbia University in the City of New York. The owner has no objection to reproduction of the work for academic non-commercial purposes, but otherwise reserves all copyright rights whatsoever. JW, NPT and AOB are co-inventors.

    JW has received research support from Janssen, Galectin, Intercept, Genfit, Shire, Conatus, Zydus, and is on the advisory board for Astra Zeneca/MedImmune, AMRA.

    RMC has received research support from Intercept Pharmaceuticals and Merck, Inc.

    MS is funded by NIDDK R01DK132138, R01DK131547 and has an unrestricted grant from Grifols, SA.

    GR retired from Janssen Pharma R&D as Scientific Fellow and Head of Computational Sciences and is currently a Venture Partner in Samsara BioCapital, Palo Alto, CA.

    AK is a speaker and proctor for Intuitive, reviewer for surgical videos for Crowd Sourced Assessment of Technical Skills (CSATs), and a consultant for Johnson and Johnson and Surgical Specialties Corporation.

    AV, LT, AS, AF, BD, AA, GR, AK, MB, MPR, MDR, JD, and NPT have nothing to disclose.

  • Guarantor of the article: Julia Wattacheril

  • Data Transparency Statement: Algorithmic code is available for academic, non-commercial collaborations by request to the corresponding authors.

  • This version of the manuscript has been revised to update the sensitivity analysis, fibrosis scoring metric results, and algorithmic performance assessments. Sensitivity is now assessed across multiple categories reflecting data that is accessible within the clinical data warehouse and the clinical hepatology patient registry. We have expanded the sections on fibrosis scores and assess the proportion of patients with advanced scores (FIB-4, APRI, NAFLD-FS) who also have a diagnosis of obesity, type 2 diabetes/dysglycemia, abnormal liver enzymes, hyperlipidemia, and hypertension. Our NAFLD algorithm is now compared to a simpler algorithm using a diagnostic combination of obesity, type 2 diabetes, and abnormal ALT values, identifying improved performance of our more complex phenotyping algorithm in discovering NAFLD patients. Additionally, our NAFLD algorithm was rerun on a more recent dataset from Columbia University Irving Medical Center (CUIMC), which identified additional patients with biopsy-proven NASH. This revision includes updated figures, supplemental methods, revised text in the introduction, methods, results, and discussion sections.

Data Availability

Data in the study were extracted from patient electronic health records at Columbia University Irving Medical Center,University of Pennsylvania Healthcare System (UPHS), and Vanderbilt Medical Center (VUMC). These data are not available for public use due to institutional privacy policies and federal regulations.

  • Abbreviations

    EHR
    Electronic Health Record
    NAFLD
    nonalcoholic fatty liver disease
    NASH
    nonalcoholic steatohepatitis
    OMOP
    Observational Medical Outcomes Partnership
    CDM
    Common Data Model
    OHDSI
    Observational Health Data Sciences and Informatics
    A1c
    Glycated hemoglobin
    T2D
    Type 2 Diabetes
    DOB
    Date of Birth
    CUIMC
    Columbia University Irving Medical Center
    UPHS
    University of Pennsylvania Health System
    VUMC
    Vanderbilt University Medical Center
    MRN
    Medical Record Number
    FIB-4
    Fibrosis-4
    APRI
    Aspartate transaminase to Platelet Ratio Index
    NIT
    non-invasive test
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
    Back to top
    PreviousNext
    Posted April 03, 2023.
    Download PDF

    Supplementary Material

    Data/Code
    Email

    Thank you for your interest in spreading the word about medRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Rapid Identification and Phenotyping of Nonalcoholic Fatty Liver Disease Patients Using an Algorithmic Approach in Diverse, Urban Healthcare Systems
    (Your Name) has forwarded a page to you from medRxiv
    (Your Name) thought you would like to see this page from the medRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Rapid Identification and Phenotyping of Nonalcoholic Fatty Liver Disease Patients Using an Algorithmic Approach in Diverse, Urban Healthcare Systems
    Anna O. Basile, Anurag Verma, Leigh Anne Tang, Marina Serper, Andrew Scanga, Ava Farrell, Brittney Destin, Rotonya M. Carr, Anuli Anyanwu-Ofili, Gunaretnam Rajagopal, Abraham Krikhely, Marc Bessler, Muredach P. Reilly, Marylyn D. Ritchie, Nicholas P. Tatonetti, Julia Wattacheril
    medRxiv 2021.04.27.21256139; doi: https://doi.org/10.1101/2021.04.27.21256139
    Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    Rapid Identification and Phenotyping of Nonalcoholic Fatty Liver Disease Patients Using an Algorithmic Approach in Diverse, Urban Healthcare Systems
    Anna O. Basile, Anurag Verma, Leigh Anne Tang, Marina Serper, Andrew Scanga, Ava Farrell, Brittney Destin, Rotonya M. Carr, Anuli Anyanwu-Ofili, Gunaretnam Rajagopal, Abraham Krikhely, Marc Bessler, Muredach P. Reilly, Marylyn D. Ritchie, Nicholas P. Tatonetti, Julia Wattacheril
    medRxiv 2021.04.27.21256139; doi: https://doi.org/10.1101/2021.04.27.21256139

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Gastroenterology
    Subject Areas
    All Articles
    • Addiction Medicine (349)
    • Allergy and Immunology (668)
    • Allergy and Immunology (668)
    • Anesthesia (181)
    • Cardiovascular Medicine (2648)
    • Dentistry and Oral Medicine (316)
    • Dermatology (223)
    • Emergency Medicine (399)
    • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
    • Epidemiology (12228)
    • Forensic Medicine (10)
    • Gastroenterology (759)
    • Genetic and Genomic Medicine (4103)
    • Geriatric Medicine (387)
    • Health Economics (680)
    • Health Informatics (2657)
    • Health Policy (1005)
    • Health Systems and Quality Improvement (985)
    • Hematology (363)
    • HIV/AIDS (851)
    • Infectious Diseases (except HIV/AIDS) (13695)
    • Intensive Care and Critical Care Medicine (797)
    • Medical Education (399)
    • Medical Ethics (109)
    • Nephrology (436)
    • Neurology (3882)
    • Nursing (209)
    • Nutrition (577)
    • Obstetrics and Gynecology (739)
    • Occupational and Environmental Health (695)
    • Oncology (2030)
    • Ophthalmology (585)
    • Orthopedics (240)
    • Otolaryngology (306)
    • Pain Medicine (250)
    • Palliative Medicine (75)
    • Pathology (473)
    • Pediatrics (1115)
    • Pharmacology and Therapeutics (466)
    • Primary Care Research (452)
    • Psychiatry and Clinical Psychology (3432)
    • Public and Global Health (6527)
    • Radiology and Imaging (1403)
    • Rehabilitation Medicine and Physical Therapy (814)
    • Respiratory Medicine (871)
    • Rheumatology (409)
    • Sexual and Reproductive Health (410)
    • Sports Medicine (342)
    • Surgery (448)
    • Toxicology (53)
    • Transplantation (185)
    • Urology (165)