Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Large-language-model-based 10-year risk prediction of cardiovascular disease: insight from the UK biobank data

View ORCID ProfileChangho Han, Dong Won Kim, View ORCID ProfileSongsoo Kim, View ORCID ProfileSeng Chan You, View ORCID ProfileSungA Bae, View ORCID ProfileDukyong Yoon
doi: https://doi.org/10.1101/2023.05.22.23289842
Changho Han
1Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Yongin, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Changho Han
Dong Won Kim
1Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Yongin, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Songsoo Kim
1Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Yongin, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Songsoo Kim
Seng Chan You
1Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Yongin, Republic of Korea
2Institute for Innovation in Digital Healthcare (IIDH), Severance Hospital, Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Seng Chan You
SungA Bae
3Department of Cardiology, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Republic of Korea
4Center for Digital Health, Yongin Severance Hospital, Yonsei University Health System, Yongin, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for SungA Bae
  • For correspondence: dukyong.yoon{at}yonsei.ac.kr bsaking{at}naver.com
Dukyong Yoon
1Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Yongin, Republic of Korea
2Institute for Innovation in Digital Healthcare (IIDH), Severance Hospital, Seoul, Republic of Korea
4Center for Digital Health, Yongin Severance Hospital, Yonsei University Health System, Yongin, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Dukyong Yoon
  • For correspondence: dukyong.yoon{at}yonsei.ac.kr bsaking{at}naver.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background Conventional cardiovascular risk prediction models provide insights into population-level risk factors and have been widely adopted in clinical practice. However, these models have limited generalizability and flexibility. Large language models (LLMs) have demonstrated remarkable proficiency for use in various industries.

Methods In this study, we have investigated the feasibility of Large Language Models (LLMs) such as ChatGPT-3.5, ChatGPT-4, and Bard for predicting 10-year cardiovascular risk of a patient. We used data from the UK Biobank Cohort, a major biomedical database in the UK, and the Korean Genome and Epidemiology Study (KoGES), a large-scale prospective study in Korea, for additional validation and multi-institutional research. These databases provided a wide array of information including age, sex, medical history, lipid profile, blood pressure, and physical measurement. Based on these data, we generated language sentences for individual analysis and input these into the LLM to derive results. The performance of the LLMs was then compared with the Framingham Risk Score (FRS), a conventional risk prediction model, using this real-world data. We confirmed the model performance of both the LLMs and FRS, evaluating their accuracy, sensitivity, specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV), and F1 score. Their performance in predicting 10-year cardiovascular risk was compared through Kaplan-Meier survival analysis and Cox-hazard ratio analysis.

Findings GPT-4 achieved performance comparable to the FRS in cardiovascular risk prediction in both the UK Biobank {accuracy (0·834 vs· 0·773) and F1 score (0·138 vs· 0·132)} and KoGES {accuracy (0·902 vs· 0·874)}. The Kaplan–Meier survival analysis of GPT-4 demonstrated distinct survival patterns among groups, which revealed a strong association between the GPT risk prediction output and survival outcomes. The additional analysis of limited variables using GPT-3·5 indicated that ChatGPT’s prediction performance was preserved despite the omission of a few variables in the prompt, especially without physical measurement data

Interpretation This study proposed that ChatGPT can achieve performance comparable to conventional models in predicting cardiovascular risk. Furthermore, ChatGPT exhibits enhanced accessibility, flexibility, and the ability to provide user-friendly outputs. With the evolution of LLMs, such as ChatGPT, studies should focus on applying LLMs to various medical scenarios and subsequently optimizing their performance.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This research was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health and Welfare, Republic of Korea (grant number: HI22C0452).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

UK Biobank : https://www.ukbiobank.ac.uk/ Korean Genome and Epidemiology Study (KoGES) data : https://nih.go.kr/eng/main/main.do

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data produced in the present work are contained in the manuscript.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted May 24, 2023.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Large-language-model-based 10-year risk prediction of cardiovascular disease: insight from the UK biobank data
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Large-language-model-based 10-year risk prediction of cardiovascular disease: insight from the UK biobank data
Changho Han, Dong Won Kim, Songsoo Kim, Seng Chan You, SungA Bae, Dukyong Yoon
medRxiv 2023.05.22.23289842; doi: https://doi.org/10.1101/2023.05.22.23289842
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Large-language-model-based 10-year risk prediction of cardiovascular disease: insight from the UK biobank data
Changho Han, Dong Won Kim, Songsoo Kim, Seng Chan You, SungA Bae, Dukyong Yoon
medRxiv 2023.05.22.23289842; doi: https://doi.org/10.1101/2023.05.22.23289842

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Cardiovascular Medicine
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)