ABSTRACT
Cardiovascular disease has been established as the world’s number one killer, causing over 20 million deaths per year. This fact, along with the growing awareness of the impact of exposomic risk factors on cardiovascular diseases, has led the scientific community to leverage machine learning strategies as a complementary approach to traditional statistical epidemiological studies that are challenged by the highly heterogeneous and dynamic nature of exposomics data. The principal objective served by this work is to identify key pertinent literature and provide an overview of the breadth of research in the field of machine learning applications on exposomics data with a focus on cardiovascular diseases. Secondarily, we aimed at identifying common limitations and meaningful directives to be addressed in the future. Overall, this work shows that, despite the fact that machine learning on exposomics data is under-researched compared to its application on other members of the -omics family, it is increasingly adopted to investigate different aspects of cardiovascular diseases.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study did not receive any funding
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Corrections have been applied on citations and figure captions.
Data Availability
This is a scoping review paper.
Abbreviations
- AdaBoost
- Adaptive Boosting
- AENET-I
- Adaptive Elastic-Net with main effects and pairwise interactions
- AI
- Artificial Intelligence
- ANN
- Artificial Neural Network
- APS
- Average Precision Score
- AUC-PR
- Area Under the Precision Recall Curve
- AUC-ROC
- Area Under the Receiver Operating Characteristic Curve
- AUC
- Area Under the Curve
- BAG
- Bagging (regressor or classifier based on context)
- BART
- Bayesian additive regression tree
- BKMR
- Bayesian Kernel Machine Regression
- BMI
- Body Mass Index
- CART
- Classification And Regression Tree
- CatBoost
- Categorical Boosting
- CNN
- Convolutional Neural Network
- CVD
- Cardio-Vascular Disease
- GB
- Gradient Boosting
- DL
- Deep Learning
- DT
- Decision Tree
- ELSTM
- Enhanced Long Short-Term Memory Model
- EN
- Elastic Net
- ERS
- Environmental Risk Score
- ExWAS
- Exposome-Wide Association Study
- FDR
- False Discovery Rate
- FNR
- False Negative Rate
- FPR
- False Positive Rate
- GGT
- Gamma-Glutamyl Transferase
- GSV
- Google Street View
- IDI
- Integrated Discrimination Improvement
- IF
- Isolation Forest
- KNN
- k-nearest neighbors
- KOBT
- Knockoff Boosted Trees
- LASSO
- Least Absolute Shrinkage and Selection Operator
- LDL
- Low-Density Lipoproteins
- LGBM
- Light Gradient Boosting Machine
- LMEM
- Linear Mixed Effects Model
- LOO-CV
- Leave-One-Out Cross-Validation
- LR
- Logistic Regression
- LSTM
- Long Short-Term Memory Model
- MAE
- Mean Absolute Error
- MAPE
- Mean Absolute Percentage Error
- MCC
- Matthew’s Correlation Coefficient
- MI
- Myocardial Infarction
- ML
- Machine Learning
- MLP
- Multi-Layer Perceptron
- MSE
- Mean-Squared Error
- MSPE
- Mean-Squared Prediction Error
- NB
- Naïve Bayes
- NPV
- Negative Predictive Value
- NRI
- Categorical Net Reclassification Improvement
- PCA
- Principal Component Analysis
- PRESS
- RF
- Random Forest
- RMSE
- Root Mean Squared Error
- SHAP
- SHapley Additive exPlanations
- SVC
- Support Vector Classification
- SVM
- Support Vector Machines
- XGBoost
- Extreme Gradient Boosting