Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Predicting the Risk of Asthma Development in Youth Using Machine Learning Models

View ORCID ProfileMatthew Xie, View ORCID ProfileChenliang Xu
doi: https://doi.org/10.1101/2024.06.24.24309438
Matthew Xie
1Pittsford Sutherland High School, Pittsford, New York
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Matthew Xie
  • For correspondence: ml12342019{at}gmail.com
Chenliang Xu
2Department of Computer Science, University of Rochester, Rochester, New York
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Chenliang Xu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Asthma is a chronic respiratory disease characterized by wheezing and difficulty breathing, which disproportionally affects 4.7 million children in the U.S. Currently, there is a lack of asthma predictive models for youth with good performance. This study aims to build machine learning models to better predict asthma development in youth using easily accessible national survey data. We analyzed cross-sectional combined 2021 and 2022 National Health Interview Survey (NHIS) data from 9,716 youth subjects with their corresponding parent information. We built several machine learning models with various sampling techniques (under- or over-sampling) for asthma prediction in youth, including XGBoost, Neural Networks, Random Forest, Support Vector Machine (SVM), and Logistic Regression. We examined the associations of potential risk factors identified from both Random Forest and Least Absolute Shrinkage and Selection Operator (LASSO) with asthma in youth. Between the different sampling techniques, undersampling the major class (subjects without asthma) yielded the best results in terms of the area under the curve (AUC) and F1 scores for the different predictive models. The Logistic Regression performed the best with the under-sampled data, yielding an AUC score of 0.7654 and an F1 score of 0.3452. In addition, we have identified additional important factors associated with asthma development in youth, such as low family poverty ratio and parents ever had asthma. This study successfully built machine learning models to predict asthma development in youth with good model performance. This will be important for early screening and detection of asthma in youth.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study did not receive any funding.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study used only openly available human data that were originally located at the NHIS website: https://www.cdc.gov/nchs/nhis/data-questionnaires-documentation.htm.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data produced are available online at https://www.cdc.gov/nchs/nhis/data-questionnaires-documentation.htm.

https://www.cdc.gov/nchs/nhis/data-questionnaires-documentation.htm

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted June 26, 2024.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Predicting the Risk of Asthma Development in Youth Using Machine Learning Models
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Predicting the Risk of Asthma Development in Youth Using Machine Learning Models
Matthew Xie, Chenliang Xu
medRxiv 2024.06.24.24309438; doi: https://doi.org/10.1101/2024.06.24.24309438
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Predicting the Risk of Asthma Development in Youth Using Machine Learning Models
Matthew Xie, Chenliang Xu
medRxiv 2024.06.24.24309438; doi: https://doi.org/10.1101/2024.06.24.24309438

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Public and Global Health
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)