Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Development of a Liver Disease-Specific Large Language Model Chat Interface using Retrieval Augmented Generation

View ORCID ProfileJin Ge, Steve Sun, Joseph Owens, Victor Galvez, Oksana Gologorskaya, View ORCID ProfileJennifer C. Lai, View ORCID ProfileMark J. Pletcher, Ki Lai
doi: https://doi.org/10.1101/2023.11.10.23298364
Jin Ge
1Division of Gastroenterology and Hepatology, Department of Medicine, University of California – San Francisco, San Francisco, CA
MD, MBA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jin Ge
  • For correspondence: jin.ge{at}ucsf.edu
Steve Sun
2UCSF Health Information Technology, University of California – San Francisco, San Francisco, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joseph Owens
2UCSF Health Information Technology, University of California – San Francisco, San Francisco, CA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Victor Galvez
2UCSF Health Information Technology, University of California – San Francisco, San Francisco, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Oksana Gologorskaya
2UCSF Health Information Technology, University of California – San Francisco, San Francisco, CA
3Bakar Computational Health Sciences Institute, University of California – San Francisco, San Francisco, CA
MS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jennifer C. Lai
1Division of Gastroenterology and Hepatology, Department of Medicine, University of California – San Francisco, San Francisco, CA
MD, MBA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jennifer C. Lai
Mark J. Pletcher
4Department of Epidemiology and Biostatistics, University of California – San Francisco, San Francisco, CA
MD, MPH
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mark J. Pletcher
Ki Lai
2UCSF Health Information Technology, University of California – San Francisco, San Francisco, CA
MS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Background Large language models (LLMs) have significant capabilities in clinical information processing tasks. Commercially available LLMs, however, are not optimized for clinical uses and are prone to generating incorrect or hallucinatory information. Retrieval-augmented generation (RAG) is an enterprise architecture that allows embedding of customized data into LLMs. This approach “specializes” the LLMs and is thought to reduce hallucinations.

Methods We developed “LiVersa,” a liver disease-specific LLM, by using our institution’s protected health information (PHI)-complaint text embedding and LLM platform, “Versa.” We conducted RAG on 30 publicly available American Association for the Study of Liver Diseases (AASLD) guidelines and guidance documents to be incorporated into LiVersa. We evaluated LiVersa’s performance by comparing its responses versus those of trainees from a previously published knowledge assessment study regarding hepatitis B (HBV) treatment and hepatocellular carcinoma (HCC) surveillance.

Results LiVersa answered all 10 questions correctly when forced to provide a “yes” or “no” answer. Full detailed responses with justifications and rationales, however, were not completely correct for three of the questions.

Discussions In this study, we demonstrated the ability to build disease-specific and PHI-compliant LLMs using RAG. While our LLM, LiVersa, demonstrated more specificity in answering questions related to clinical hepatology – there were some knowledge deficiencies due to limitations set by the number and types of documents used for RAG. The LiVersa prototype, however, is a proof of concept for utilizing RAG to customize LLMs for clinical uses and a potential strategy to realize personalized medicine in the future.

Competing Interest Statement

The authors of this manuscript have the following potential conflicts of interest to disclose: -Dr. Jin Ge receives research support from Merck and Co; and consults for Astellas Pharmaceuticals/Iota Biosciences. -Dr. Jennifer C. Lai receives research support from Lipocene and Vir Biotechnologies; receives an education grant from Nestle Nutrition Sciences; serves on an advisory board for Novo Nordisk; and consults for Genfit, Third Rock Ventures, and Boehringer Ingelheim.

Funding Statement

The authors of this study were supported in part by the KL2TR001870 (National Center for Advancing Translational Sciences, Ge), P30DK026743 (UCSF Liver Center Grant, Ge and Lai), UL1TR001872 (National Center for Advancing Translational Sciences, Pletcher), and R01AG059183/K24AG080021 (National Institute on Aging, Lai). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or any other funding agencies. The funding agencies played no role in the analysis of the data or the preparation of this manuscript.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

The clinical guidelines/guidance documents used in the methods of this manuscript are publicly available.

  • Abbreviations

    AASLD
    American Association for the Study of Liver Diseases
    APIs
    application programming interfaces
    FHIR
    Fast Healthcare Interoperability Resources
    HBV
    hepatitis B
    HCC
    hepatocellular carcinoma
    GAI
    generative artificial intelligence
    GPT
    generative pre-trained transformer
    LLMs
    large language models
    PHI
    protected health information
    RAG
    retrieval-augmented generation
    UCSF
    University of California, San Francisco
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
    Back to top
    PreviousNext
    Posted November 11, 2023.
    Download PDF
    Data/Code
    Email

    Thank you for your interest in spreading the word about medRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Development of a Liver Disease-Specific Large Language Model Chat Interface using Retrieval Augmented Generation
    (Your Name) has forwarded a page to you from medRxiv
    (Your Name) thought you would like to see this page from the medRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Development of a Liver Disease-Specific Large Language Model Chat Interface using Retrieval Augmented Generation
    Jin Ge, Steve Sun, Joseph Owens, Victor Galvez, Oksana Gologorskaya, Jennifer C. Lai, Mark J. Pletcher, Ki Lai
    medRxiv 2023.11.10.23298364; doi: https://doi.org/10.1101/2023.11.10.23298364
    Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    Development of a Liver Disease-Specific Large Language Model Chat Interface using Retrieval Augmented Generation
    Jin Ge, Steve Sun, Joseph Owens, Victor Galvez, Oksana Gologorskaya, Jennifer C. Lai, Mark J. Pletcher, Ki Lai
    medRxiv 2023.11.10.23298364; doi: https://doi.org/10.1101/2023.11.10.23298364

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Gastroenterology
    Subject Areas
    All Articles
    • Addiction Medicine (349)
    • Allergy and Immunology (668)
    • Allergy and Immunology (668)
    • Anesthesia (181)
    • Cardiovascular Medicine (2648)
    • Dentistry and Oral Medicine (316)
    • Dermatology (223)
    • Emergency Medicine (399)
    • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
    • Epidemiology (12228)
    • Forensic Medicine (10)
    • Gastroenterology (759)
    • Genetic and Genomic Medicine (4103)
    • Geriatric Medicine (387)
    • Health Economics (680)
    • Health Informatics (2657)
    • Health Policy (1005)
    • Health Systems and Quality Improvement (985)
    • Hematology (363)
    • HIV/AIDS (851)
    • Infectious Diseases (except HIV/AIDS) (13695)
    • Intensive Care and Critical Care Medicine (797)
    • Medical Education (399)
    • Medical Ethics (109)
    • Nephrology (436)
    • Neurology (3882)
    • Nursing (209)
    • Nutrition (577)
    • Obstetrics and Gynecology (739)
    • Occupational and Environmental Health (695)
    • Oncology (2030)
    • Ophthalmology (585)
    • Orthopedics (240)
    • Otolaryngology (306)
    • Pain Medicine (250)
    • Palliative Medicine (75)
    • Pathology (473)
    • Pediatrics (1115)
    • Pharmacology and Therapeutics (466)
    • Primary Care Research (452)
    • Psychiatry and Clinical Psychology (3432)
    • Public and Global Health (6527)
    • Radiology and Imaging (1403)
    • Rehabilitation Medicine and Physical Therapy (814)
    • Respiratory Medicine (871)
    • Rheumatology (409)
    • Sexual and Reproductive Health (410)
    • Sports Medicine (342)
    • Surgery (448)
    • Toxicology (53)
    • Transplantation (185)
    • Urology (165)