PT - JOURNAL ARTICLE AU - Wang, Andrew AU - Fulton, Rachel AU - Hwang, Sy AU - Margolis, David J. AU - Mowery, Danielle L. TI - Patient Phenotyping for Atopic Dermatitis with Transformers and Machine Learning AID - 10.1101/2023.08.25.23294636 DP - 2023 Jan 01 TA - medRxiv PG - 2023.08.25.23294636 4099 - http://medrxiv.org/content/early/2023/12/04/2023.08.25.23294636.short 4100 - http://medrxiv.org/content/early/2023/12/04/2023.08.25.23294636.full AB - Background Atopic dermatitis (AD) is a chronic skin condition that millions of people around the world live with each day. Performing research studies into identifying the causes and treatment for this disease has great potential to provide benefit for these individuals. However, AD clinical trial recruitment is a non-trivial task due to variance in diagnostic precision and phenotypic definitions leveraged by different clinicians as well as time spent finding, recruiting, and enrolling patients by clinicians to become study subjects. Thus, there is a need for automatic and effective patient phenotyping for cohort recruitment.Objective Our study aims to present an approach for identifying patients whose electronic health records suggest that they may have AD.Methods We created a vectorized representation of each patient and trained various supervised machine learning methods to classify when a patient has AD. Each patient is represented by a vector of either probabilities or binary values where each value indicates whether they meet a different criteria for AD diagnosis. Results: The most accurate AD classifier performed with a class-balanced accuracy of 0.8036, a precision of 0.8400, and a recall of 0.7500 when using XGBoost (Extreme Gradient Boosting).Conclusions Creating an automated approach for identifying patient cohorts has the potential to accelerate, standardize, and automate the process of patient recruitment for AD studies; therefore, reducing clinician burden and informing knowledge discovery of better treatment options for AD.Competing Interest StatementDavid J. Margolis is or recently has been a consultant for Pfizer, Leo, and Sanofi with respect to studies of atopic dermatitis and served on an advisory board for the National Eczema Association.Funding StatementThis study was partially funded by the National Institutes of Health (NIH) National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) P30-AR069589 as part of the Penn Skin Biology and Diseases Resource-Based Center (Core: David J. Margolis, Danielle L. Mowery).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:IRB of University of Pennsylvania gave ethical approval for this work.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesTo protect patient privacy, the clinical data is not available.ADatopic dermatitisBERTBidirectional Encoder Representations from TransformersEHRElectronic Health RecordsICDInternational Classification of DiseaseUKWPUnited Kingdom Working PartyHRHanifin and RajkaAIArtificial IntelligenceNLPNatural Language ProcessingMLMachine LearningMLPMulti-layer PerceptronReLURectified Linear UnitSGDStochastic Gradient DescentKNNK-Nearest NeighborsXGBoostExtreme Gradient BoostingAdaBoostAdaptive BoostingSVMSupport Vector MachinesTPTrue PositiveTNTrue NegativeFPFalse PositiveFNFalse NegativeNPVNegative Predictive Value