Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Predict multicategory causes of death in lung cancer patients using clinicopathologic factors

Fei Deng, Haijun Zhou, Yong Lin, John Heim, Lanlan Shen, Yuan Li, View ORCID ProfileLanjing Zhang
doi: https://doi.org/10.1101/2020.09.25.20201095
Fei Deng
1School of Electrical and Electronic Engineering, Shanghai Institute of Technology, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Haijun Zhou
2Department of Pathology and Genomic Medicine, Houston Methodist Hospital, Houston, Texas
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yong Lin
3Rutgers Cancer Institute of New Jersey, Rutgers, New Brunswick, New Jersey
4Department of Biostatistics, Rutgers School of Public Health, Piscataway, New Jersey
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John Heim
5Princeton Medical Center, Plainsboro, New Jersey
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lanlan Shen
6Department of Pediatrics, Baylor College of Medicine, USDA/ARS Children’s Nutrition Research Center, Houston, Texas
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yuan Li
7Department of Pathology, Fudan University Shanghai Cancer Center, Shanghai, China
8Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lanjing Zhang
3Rutgers Cancer Institute of New Jersey, Rutgers, New Brunswick, New Jersey
5Princeton Medical Center, Plainsboro, New Jersey
9Department of Biological Sciences, Rutgers University, Newark, New Jersey
10Department of Chemical Biology, Rutgers Ernest Mario School of Pharmacy, Rutgers University, Piscataway, Newark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Lanjing Zhang
  • For correspondence: lanjing.zhang{at}rutgers.edu lumoxuan2009{at}163.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Background Random forest model is a recently developed machine-learning algorithm, and superior to other machine learning and regression models for its classification function and better accuracy. But it is rarely used for predicting causes of death in lung cancer patients. On the other hand, specific causes of death in lung cancer patients are poorly classified or predicted, largely due to its categorical nature (versus binary death/survival).

Methods We therefore tuned and employed a random forest algorithm (Stata, version 15) to classify and predict specific causes of death in lung cancer patients, using the surveillance, epidemiology and end results-18 and several clinicopathological factors. The lung cancer diagnosed during 2004 were included for the completeness in their follow-up and death causes. The patients were randomly divided into training and validation sets (1:1 match). We also compared the accuracies of the final random forest and multinomial regression models.

Results We identified and randomly selected 40,000 lung cancers for the analyses, including 20,000 cases for either set. The causes of death were, in descending ranking order, were lung cancer (72.45 %), other causes or alive (14.38%), non-lung cancer (6.87%), cardiovascular disease (5.35%), and infection (0.95%). We found more 250 iterations and the 10 variables produced the best prediction, whose best accuracy was 69.8% (error-rate 30.2%). The final random forest model with 300 iterations and 10 variables reached an accuracy higher than that of multinomial regression model (69.8% vs 64.6%). The top-10 most important factors in the random-forest model were sex, chemotherapy status, age (65+ vs <65 years), radiotherapy status, nodal status, T category, histology type and laterality, which were also independently associated with 5-category causes of death.

Conclusion We optimized a random forest model of machine learning to predict the specific cause of death in lung cancer patients using a set of clinicopathologic factors. The model also appears more accurate than multinomial regression model.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

None.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The informed consent was not obtained for the SEER patients due to de-identified nature of the dataset. Owing to the use of publicly available, de-identified SEER cases, this study was exempt from an institutional review board approval. However, we have received the approval for using the SEER-18 data under the condition of compliance with their preset terms (user ID lzhang). Moreover, all 50 states in the USA have laws requiring newly diagnosed cancers to be reported to a central registry. The state cancer registries in the SEER program would deposit their extracted, de-identified cancer data to the SEER database after meeting quality control standards (www.seer.cancer.gov). Thus, the SEER data collection was authorized by the US state laws, and supervised by respective state public-health officials and ethical review committees.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

The SEER data were available upon request to the SEER website (www.seer.cancer.gov). All other data are available upon request.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted September 27, 2020.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Predict multicategory causes of death in lung cancer patients using clinicopathologic factors
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Predict multicategory causes of death in lung cancer patients using clinicopathologic factors
Fei Deng, Haijun Zhou, Yong Lin, John Heim, Lanlan Shen, Yuan Li, Lanjing Zhang
medRxiv 2020.09.25.20201095; doi: https://doi.org/10.1101/2020.09.25.20201095
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Predict multicategory causes of death in lung cancer patients using clinicopathologic factors
Fei Deng, Haijun Zhou, Yong Lin, John Heim, Lanlan Shen, Yuan Li, Lanjing Zhang
medRxiv 2020.09.25.20201095; doi: https://doi.org/10.1101/2020.09.25.20201095

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Pathology
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)