PT - JOURNAL ARTICLE AU - Zhang, Lin AU - Liu, Yue AU - Wang, Kaiyue AU - Ou, Xiangqin AU - Zhou, Jiashun AU - Zhang, Houliang AU - Huang, Min AU - Du, Zhenfang AU - Qiang, Sheng TI - Integration of Machine Learning to Identify Diagnostic Genes in Leukocytes for Acute Myocardial Infarction Patients AID - 10.1101/2023.09.07.23295181 DP - 2023 Jan 01 TA - medRxiv PG - 2023.09.07.23295181 4099 - http://medrxiv.org/content/early/2023/09/08/2023.09.07.23295181.short 4100 - http://medrxiv.org/content/early/2023/09/08/2023.09.07.23295181.full AB - Background Acute myocardial infarction (AMI) has two clinical characteristics: high missed diagnosis and dysfunction of leukocytes. Transcriptional RNA on leukocytes is closely related to the course evolution of AMI patients. We hypothesized that transcriptional RNA in leukocytes might provide potential diagnostic value for AMI. Integration machine learning (IML) was first used to explore AMI discrimination genes. The following clinical study was performed to validate the results.Methods A total of four AMI microarrays (derived from the Gene Expression Omnibus) were included in this study (220 sample size), and the controls were identified as patients with stable coronary artery disease (SCAD). At a ratio of 5:2, GSE59867 was included in the training set, while GSE60993, GSE62646, and GSE48060 were included in the testing set. IML was explicitly proposed in this research, which is composed of six machine learning algorithms, including support vector machine (SVM), neural network (NN), random forest (RF), gradient boosting machine (GBM), decision trees (DT), and least absolute shrinkage and selection operator (LASSO). IML had two functions in this research: filtered optimized variables and predicted the categorized value. Furthermore, 40 individuals were recruited, and the results were verified.Results Thirty-nine differentially expressed genes (DEGs) were identified between controls and AMI individuals from the training sets. Among the thirty-nine DEGs, IML was used to process the predicted classification model and identify potential candidate genes with overall normalized weights >1. Finally, Two genes (AQP9 and SOCS3) show their diagnosis value with the area under the curve (AUC) > 0.9 in both the training and testing sets. The clinical study verified the significance of AQP9 and SOCS3. Notably, more stenotic coronary arteries or severe Killip classification indicated higher levels of these two genes, especially SOCS3. These two genes correlated with two immune cell types, monocytes and neutrophils.Conclusion AQP9 and SOCS3 in leukocytes may be conducive to identifying AMI patients with SCAD patients. AQP9 and SOCS3 are closely associated with monocytes and neutrophils, which might contribute to advancing AMI diagnosis and shed light on novel genetic markers. Multiple clinical characteristics, multicenter, and large-sample relevant trials are still needed to confirm its clinical value.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThe research was funded by Suzhou Science & Technology Development Plan (SYSD2019222). Zhangjiagang science and technology plan project (ZKS2135), Youth science and technology project of Zhangjiagang Municipal Health Commission (ZJGQNKJ202211).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Ethics Review Committee of Jinghai District Hospital approved the study.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesThe datasets presented in this study can be found online. The names of the repositories and GEO numbers can be found below: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59867;https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60993;https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62646;https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE48060.AUCArea under the CurveAMIAcute Myocardial InfarctionIMLIntegration Machine LearningDEGsDifferently Expressed GenesKEGG-GSEAKyoto Encyclopedia of Genes and Genomes-Gene Set Enrichment AnalysisGOGene OntologyDODisease OntologyMFMolecular FunctionBPBiological ProcessCCCellular ComponentsSVMSupport Vector MachineMLMachine LearningLASSOLeast Absolute Shrinkage and Selection OperatorRFRandom ForestGBMGradient Boosting MachineDTDecision TreesNNNeural Network.