Exposomics and Cardiovascular Diseases: A Scoping Review of Machine Learning Approaches

Katerina D. Argyri; Ioannis K. Gallos; Angelos Amditis; Dimitra D. Dionysiou

doi:10.1101/2024.07.19.24310695

ABSTRACT

Cardiovascular disease has been established as the world’s number one killer, causing over 20 million deaths per year. This fact, along with the growing awareness of the impact of exposomic risk factors on cardiovascular diseases, has led the scientific community to leverage machine learning strategies as a complementary approach to traditional statistical epidemiological studies that are challenged by the highly heterogeneous and dynamic nature of exposomics data. The principal objective served by this work is to identify key pertinent literature and provide an overview of the breadth of research in the field of machine learning applications on exposomics data with a focus on cardiovascular diseases. Secondarily, we aimed at identifying common limitations and meaningful directives to be addressed in the future. Overall, this work shows that, despite the fact that machine learning on exposomics data is under-researched compared to its application on other members of the -omics family, it is increasingly adopted to investigate different aspects of cardiovascular diseases.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

Corrections have been applied on citations and figure captions.

Data Availability

This is a scoping review paper.

Abbreviations

AdaBoost: Adaptive Boosting
AENET-I: Adaptive Elastic-Net with main effects and pairwise interactions
AI: Artificial Intelligence
ANN: Artificial Neural Network
APS: Average Precision Score
AUC-PR: Area Under the Precision Recall Curve
AUC-ROC: Area Under the Receiver Operating Characteristic Curve
AUC: Area Under the Curve
BAG: Bagging (regressor or classifier based on context)
BART: Bayesian additive regression tree
BKMR: Bayesian Kernel Machine Regression
BMI: Body Mass Index
CART: Classification And Regression Tree
CatBoost: Categorical Boosting
CNN: Convolutional Neural Network
CVD: Cardio-Vascular Disease
GB: Gradient Boosting
DL: Deep Learning
DT: Decision Tree
ELSTM: Enhanced Long Short-Term Memory Model
EN: Elastic Net
ERS: Environmental Risk Score
ExWAS: Exposome-Wide Association Study
FDR: False Discovery Rate
FNR: False Negative Rate
FPR: False Positive Rate
GGT: Gamma-Glutamyl Transferase
GSV: Google Street View
IDI: Integrated Discrimination Improvement
IF: Isolation Forest
KNN: k-nearest neighbors
KOBT: Knockoff Boosted Trees
LASSO: Least Absolute Shrinkage and Selection Operator
LDL: Low-Density Lipoproteins
LGBM: Light Gradient Boosting Machine
LMEM: Linear Mixed Effects Model
LOO-CV: Leave-One-Out Cross-Validation
LR: Logistic Regression
LSTM: Long Short-Term Memory Model
MAE: Mean Absolute Error
MAPE: Mean Absolute Percentage Error
MCC: Matthew’s Correlation Coefficient
MI: Myocardial Infarction
ML: Machine Learning
MLP: Multi-Layer Perceptron
MSE: Mean-Squared Error
MSPE: Mean-Squared Prediction Error
NB: Naïve Bayes
NPV: Negative Predictive Value
NRI: Categorical Net Reclassification Improvement
PCA: Principal Component Analysis
PRESS
RF: Random Forest
RMSE: Root Mean Squared Error
SHAP: SHapley Additive exPlanations
SVC: Support Vector Classification
SVM: Support Vector Machines
XGBoost: Extreme Gradient Boosting

The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.