PT - JOURNAL ARTICLE AU - Van Phan, Hoang AU - Spottiswoode, Natasha AU - Lydon, Emily C. AU - Chu, Victoria T. AU - Cuesta, Adolfo AU - Kazberouk, Alexander D. AU - Richmond, Natalie L. AU - Calfee, Carolyn S. AU - Langelier, Charles R. TI - Integrating a host transcriptomic biomarker with a large language model for diagnosis of lower respiratory tract infection AID - 10.1101/2024.08.28.24312732 DP - 2024 Jan 01 TA - medRxiv PG - 2024.08.28.24312732 4099 - http://medrxiv.org/content/early/2024/08/29/2024.08.28.24312732.short 4100 - http://medrxiv.org/content/early/2024/08/29/2024.08.28.24312732.full AB - Lower respiratory tract infections (LRTIs) are a leading cause of mortality worldwide. Despite this, diagnosing LRTI remains challenging, particularly in the intensive care unit, where non-infectious respiratory conditions can present with similar features. Here, we tested a new method for LRTI diagnosis that combines the transcriptomic biomarker FABP4 with assessment of text from the electronic medical record (EMR) using the large language model Generative Pre-trained Transformer 4 (GPT-4). We evaluated this methodology in a prospective cohort of critically ill adults with acute respiratory failure, in which we measured pulmonary FABP4 expression and identified patients with LRTI or non-infectious conditions using retrospective adjudication. A diagnostic classifier combining FABP4 and GPT-4 achieved an area under the receiver operator curve (AUC) of 0.92 ± 0.06 by five-fold cross validation (CV), outperforming classifiers based on FABP4 expression alone (AUC 0.83) or GPT-4 alone (AUC 0.84). At the Youden’s index within each CV fold, the combined classifier achieved a mean sensitivity of 92% ± 7%, specificity of 90% ± 17% and accuracy of 91% +/- 8%. Taken together, our findings suggest that combining a host transcriptional biomarker with interpretation of EMR data using artificial intelligence is a promising new approach to infectious disease diagnosis.Competing Interest StatementThe authors have declared no competing interest.Funding Statement5R01HL155418 (CRL); Chan Zuckerberg Biohub San Francisco (CRL); R35HL140026 (CSC)Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The cohort was approved by the University of California Institutional Review Board (protocol #10-02701) and informed consent was obtained from patients or surrogate decision makers.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesThe gene count data are available at https://github.com/infectiousdisease-langelier-lab/LRTI_FABP4_classifier. Source data are provided in the source data file. https://github.com/infectiousdisease-langelier-lab/LRTI_FABP4_classifier https://github.com/infectiousdisease-langelier-lab/LRTI_FABP4_GPT4_classifier