Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Individualized melanoma risk prediction using machine learning with electronic health records

Guihong Wan, Sara Khattab, Katie Roster, Nga Nguyen, Boshen Yan, Hannah Rashdan, Hossein Estiri, Yevgeniy R. Semenov
doi: https://doi.org/10.1101/2024.07.26.24311080
Guihong Wan
1Department of Dermatology, Massachusetts General Hospital, Harvard Medical School
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sara Khattab
1Department of Dermatology, Massachusetts General Hospital, Harvard Medical School
BS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katie Roster
1Department of Dermatology, Massachusetts General Hospital, Harvard Medical School
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nga Nguyen
1Department of Dermatology, Massachusetts General Hospital, Harvard Medical School
MD, MPH
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Boshen Yan
1Department of Dermatology, Massachusetts General Hospital, Harvard Medical School
MBI
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hannah Rashdan
1Department of Dermatology, Massachusetts General Hospital, Harvard Medical School
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hossein Estiri
2Department of Medicine, Massachusetts General Hospital, Harvard Medical School
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yevgeniy R. Semenov
1Department of Dermatology, Massachusetts General Hospital, Harvard Medical School
MD, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: ysemenov{at}mgh.harvard.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

ABSTRACT

Background Melanoma is a lethal form of skin cancer with a high propensity for metastasizing, making early detection crucial. This study aims to develop a machine learning model using electronic health record data to identify patients at high risk of developing melanoma to prioritize them for dermatology screening.

Methods This retrospective study included patients diagnosed with melanoma (cases), as well as matched patients without melanoma (controls), from Massachusetts General Hospital (MGH), Brigham and Women’s Hospital (BWH), Dana-Farber Cancer Institute (DFCI), and other hospital centers within the Research Patient Data Registry at Mass General Brigham healthcare system between 1992 and 2022. Patient demographics, family history, diagnoses, medications, procedures, laboratory tests, reasons for visits, and allergy data six months prior to the date of first melanoma diagnosis or date of censoring were extracted. A machine learning framework for health outcomes (MLHO) was utilized to build the model. Performance was evaluated using five-fold cross-validation of the MGH cohort (internal validation) and by using the MGH cohort for model training and the non-MGH cohort for independent testing (external validation). The Area Under the Receiver Operating Characteristic Curve (AUC-ROC) and the Area Under the Precision-Recall Curve (AUC-PR), along with 95% Confidence Intervals (CIs), were computed.

Results This study identified 10,778 patients with melanoma and 10,778 matched patients without melanoma, including 8,944 from MGH and 1,834 from non-MGH hospitals in each cohort, both with an average follow-up duration of 9 years. In the internal and external validations, the model achieved AUC-ROC values of 0.826 (95% CI: 0.819–0.832) and 0.823 (95% CI: 0.809–0.837) and AUC-PR scores of 0.841 (95% CI: 0.834–0.848) and 0.822 (95% CI: 0.806–0.839), respectively. Important risk features included a family history of melanoma, a family history of skin cancer, and a prior diagnosis of benign neoplasm of skin. Conversely, medical examination without abnormal findings was identified as a protective feature.

Conclusions Machine learning techniques and electronic health records can be effectively used to predict melanoma risk, potentially aiding in identifying high-risk patients and enabling individualized screening strategies for melanoma.

Competing Interest Statement

YRS is an advisory board member or consultant and has received honoraria from Pfizer, Incyte Corporation, Sanofi, Galderma, Castle Biosciences, and Iovance Biotherapeutics.

Funding Statement

the Melanoma Research Alliance Dermatology Fellowship award: https://doi.org/10.48050/pc.gr.157226.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Institutional Review Board of Mass General Brigham gave ethical approval for this work (Protocol #2020P002113).

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • Prior Presentation: Oral Presentation at the Annual Meeting of the Societies for Investigative Dermatology (SID), 2024.

  • Funding sources: This study is supported by the Melanoma Research Alliance Dermatology Fellowship award: https://doi.org/10.48050/pc.gr.157226.

  • Conflicts of Interest: YRS is an advisory board member or consultant and has received honoraria from Pfizer, Incyte Corporation, Sanofi, Galderma, Castle Biosciences, and Iovance Biotherapeutics.

  • IRB approval status: Reviewed and approved by Mass General Brigham Institutional Review Board (Protocol #2020P002113)

Data Availability

All summary data supporting the findings of this study available within the article or its supplementary materials. The patient data generated for this study can only be shared per specific institutional review board requirements. Upon request to the corresponding author, a data sharing agreement can be initiated following institution-specific guidelines.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted July 27, 2024.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Individualized melanoma risk prediction using machine learning with electronic health records
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Individualized melanoma risk prediction using machine learning with electronic health records
Guihong Wan, Sara Khattab, Katie Roster, Nga Nguyen, Boshen Yan, Hannah Rashdan, Hossein Estiri, Yevgeniy R. Semenov
medRxiv 2024.07.26.24311080; doi: https://doi.org/10.1101/2024.07.26.24311080
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Individualized melanoma risk prediction using machine learning with electronic health records
Guihong Wan, Sara Khattab, Katie Roster, Nga Nguyen, Boshen Yan, Hannah Rashdan, Hossein Estiri, Yevgeniy R. Semenov
medRxiv 2024.07.26.24311080; doi: https://doi.org/10.1101/2024.07.26.24311080

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Dermatology
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)