Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Non-exercise Machine Learning Models for Maximal Oxygen Uptake Prediction in National Population Surveys

Yuntian Liu, Jeph Herrin, Chenxi Huang, Rohan Khera, Lovedeep Singh Dhingra, Weilai Dong, Bobak J. Mortazavi, Harlan M. Krumholz, Yuan Lu
doi: https://doi.org/10.1101/2022.09.30.22280471
Yuntian Liu
1Center for Outcomes Research and Evaluation, Yale New Haven Hospital, New Haven, CT
2Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT
MPH
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeph Herrin
2Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chenxi Huang
1Center for Outcomes Research and Evaluation, Yale New Haven Hospital, New Haven, CT
2Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rohan Khera
1Center for Outcomes Research and Evaluation, Yale New Haven Hospital, New Haven, CT
2Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT
MD, MS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lovedeep Singh Dhingra
1Center for Outcomes Research and Evaluation, Yale New Haven Hospital, New Haven, CT
MBBS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Weilai Dong
1Center for Outcomes Research and Evaluation, Yale New Haven Hospital, New Haven, CT
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bobak J. Mortazavi
1Center for Outcomes Research and Evaluation, Yale New Haven Hospital, New Haven, CT
4Department of Computer Science and Engineering, Texas A&M University, College Station, TX
5Center for Remote Health Technologies and Systems, Texas A&M University, College Station, TX
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Harlan M. Krumholz
1Center for Outcomes Research and Evaluation, Yale New Haven Hospital, New Haven, CT
2Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT
3Department of Health Policy and Management, Yale School of Public Health, New Haven, CT
MD, SM
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yuan Lu
1Center for Outcomes Research and Evaluation, Yale New Haven Hospital, New Haven, CT
2Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT
ScD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: y.lu{at}yale.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

ABSTRACT

Background Maximal oxygen uptake (VO2 max), an indicator of cardiorespiratory fitness (CRF), requires exercise testing and, as a result, is rarely ascertained in large-scale population-based studies. Non-exercise algorithms are cost-effective methods to estimate VO2 max, but the existing models have limitations in generalizability and predictive power. This study aims to improve the non-exercise algorithms using machine learning (ML) methods and data from U.S. national population surveys.

Methods We used the 1999-2004 data from the National Health and Nutrition Examination Survey (NHANES), in which a submaximal exercise test produced an estimate of the VO2max. We applied multiple supervised ML algorithms to build two models: a parsimonious model that used variables readily available in clinical practice, and an extended model that additionally included more complex variables from more Dual-Energy X-ray Absorptiometry (DEXA) and standard laboratory tests. We used Shapley additive explanation (SHAP) to interpret the new model and identify the key predictors. For comparison, existing non-exercise algorithms were applied unmodified to the testing set.

Results Among the 5,668 NHANES participants included in the final study population, the mean age was 32.5 years and 49.9% were women. Light Gradient Boosting Machine (LightGBM) had the best performance across multiple types of supervised ML algorithms. Compared with the best existing non-exercise algorithms that could be applied in NHANES, the parsimonious LightGBM model (RMSE: 8.51 ml/kg/min [95% CI: 7.73 -9.33]) and the extended model (RMSE: 8.26 ml/kg/min [95% CI: 7.44 -9.09]) significantly reducing the error by 15% (P <0.01) and 12% (P<0.01 for both), respectively.

Conclusion Our non-exercise ML model provides a more accurate prediction of VO2 max for NHANES participants than existing non-exercise algorithms.

What is Known

  • Although cardiorespiratory fitness is recognized as an important marker of cardiovascular health, it is not routinely measured because of the time and resources required to perform exercise tests.

  • Non-exercise algorithms are cost-effective alternatives to estimate cardiorespiratory fitness, but the existing models are restricted in generalizability and predictive power.

What the Study Adds

  • We improve non-exercise algorithms for cardiorespiratory fitness prediction using advanced ML methods and a more comprehensive and representative data source from U.S. national population surveys.

  • More health factors that are associated with cardiorespiratory fitness are newly identified.

  • Nationally representative estimates for cardiorespiratory fitness in the U.S. over the recent 20 years are generated.

Competing Interest Statement

In the past three years, Harlan Krumholz received expenses and/or personal fees from UnitedHealth, Element Science, Aetna, Reality Labs, Tesseract/4Catalyst, F-Prime, the Siegfried and Jensen Law Firm, Arnold and Porter Law Firm, and Martin/Baughman Law Firm. He is a co-founder of Refactor Health and HugoHealth, and is associated with contracts, through Yale New Haven Hospital, from the Centers for Medicare & Medicaid Services and through Yale University from Johnson & Johnson. Bobak Mortazavi received expenses and/or personal fees from HugoHealth, as a consultant. Dr. Khera receives support from the National Heart, Lung, and Blood Institute of the National Institutes of Health under award, 1K23HL153775, and is a founder of Evidence2Health, a precision health and digital health analytics platform. The other co-authors report no potential competing interests.

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study used (or will use) ONLY openly available human data that were originally located at:https://wwwn.cdc.gov/nchs/nhanes/Default.aspx

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

All data produced are available online at https://wwwn.cdc.gov/nchs/nhanes/Default.aspx

https://wwwn.cdc.gov/nchs/nhanes/Default.aspx

  • Non-standard Abbreviations and Acronyms

    CRF
    Cardiorespiratory fitness
    VO2max
    Maximal oxygen uptake
    CPX
    Cardiopulmonary exercise testing
    ML
    Machine learning
    NHANES
    National Health and Nutrition Examination Survey
    STROBE
    Strengthening the Reporting of Observational Studies in Epidemiology
    COVID-19
    coronavirus disease 2019
    MEC
    Mobile Examination Center
    KNN
    K-Nearest Neighbors
    LASSO
    Least Absolute Shrinkage and Selection Operator
    SVR
    Support Vector Regression
    RF
    Random Forest
    GBDT
    Gradient Boosting decision tree
    XGBoost
    Extreme Gradient Boosting
    LightGBM
    Light Gradient Boosting Machine
    SHAP
    Shapley additive explanation
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
    Back to top
    PreviousNext
    Posted October 04, 2022.
    Download PDF

    Supplementary Material

    Data/Code
    Email

    Thank you for your interest in spreading the word about medRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Non-exercise Machine Learning Models for Maximal Oxygen Uptake Prediction in National Population Surveys
    (Your Name) has forwarded a page to you from medRxiv
    (Your Name) thought you would like to see this page from the medRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Non-exercise Machine Learning Models for Maximal Oxygen Uptake Prediction in National Population Surveys
    Yuntian Liu, Jeph Herrin, Chenxi Huang, Rohan Khera, Lovedeep Singh Dhingra, Weilai Dong, Bobak J. Mortazavi, Harlan M. Krumholz, Yuan Lu
    medRxiv 2022.09.30.22280471; doi: https://doi.org/10.1101/2022.09.30.22280471
    Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    Non-exercise Machine Learning Models for Maximal Oxygen Uptake Prediction in National Population Surveys
    Yuntian Liu, Jeph Herrin, Chenxi Huang, Rohan Khera, Lovedeep Singh Dhingra, Weilai Dong, Bobak J. Mortazavi, Harlan M. Krumholz, Yuan Lu
    medRxiv 2022.09.30.22280471; doi: https://doi.org/10.1101/2022.09.30.22280471

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Health Informatics
    Subject Areas
    All Articles
    • Addiction Medicine (349)
    • Allergy and Immunology (668)
    • Allergy and Immunology (668)
    • Anesthesia (181)
    • Cardiovascular Medicine (2648)
    • Dentistry and Oral Medicine (316)
    • Dermatology (223)
    • Emergency Medicine (399)
    • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
    • Epidemiology (12228)
    • Forensic Medicine (10)
    • Gastroenterology (759)
    • Genetic and Genomic Medicine (4103)
    • Geriatric Medicine (387)
    • Health Economics (680)
    • Health Informatics (2657)
    • Health Policy (1005)
    • Health Systems and Quality Improvement (985)
    • Hematology (363)
    • HIV/AIDS (851)
    • Infectious Diseases (except HIV/AIDS) (13695)
    • Intensive Care and Critical Care Medicine (797)
    • Medical Education (399)
    • Medical Ethics (109)
    • Nephrology (436)
    • Neurology (3882)
    • Nursing (209)
    • Nutrition (577)
    • Obstetrics and Gynecology (739)
    • Occupational and Environmental Health (695)
    • Oncology (2030)
    • Ophthalmology (585)
    • Orthopedics (240)
    • Otolaryngology (306)
    • Pain Medicine (250)
    • Palliative Medicine (75)
    • Pathology (473)
    • Pediatrics (1115)
    • Pharmacology and Therapeutics (466)
    • Primary Care Research (452)
    • Psychiatry and Clinical Psychology (3432)
    • Public and Global Health (6527)
    • Radiology and Imaging (1403)
    • Rehabilitation Medicine and Physical Therapy (814)
    • Respiratory Medicine (871)
    • Rheumatology (409)
    • Sexual and Reproductive Health (410)
    • Sports Medicine (342)
    • Surgery (448)
    • Toxicology (53)
    • Transplantation (185)
    • Urology (165)