Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests

Cabitza Federico, Campagner Andrea, Ferrari Davide, Di Resta Chiara, Ceriotti Daniele, Sabetta Eleonora, Colombini Alessandra, De Vecchi Elena, Banfi Giuseppe, Locatelli Massimo, Carobene Anna
doi: https://doi.org/10.1101/2020.10.02.20205070
Cabitza Federico
1DISCo, Università degli Studi di Milano-Bicocca, Viale Sarca 336, Milano, 20126, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Campagner Andrea
2IRCCS Istituto Ortopedico Galeazzi, Orthopaedic Biotechnology Lab, Via Riccardo Galeazzi, 4, 20161, Milano, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ferrari Davide
3SCVSA Department, University of Parma, Parco Area delle Science 11/a, 43124, Parma, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Di Resta Chiara
4Vita-Salute San Raffaele University; Unit of Genomics for Human Disease Diagnosis, Division of Genetics and Cell Biology., Via Olgettina 58, 20132, Milan, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ceriotti Daniele
5Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Via Olgettina 60, 20132, Milan, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sabetta Eleonora
5Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Via Olgettina 60, 20132, Milan, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Colombini Alessandra
2IRCCS Istituto Ortopedico Galeazzi, Orthopaedic Biotechnology Lab, Via Riccardo Galeazzi, 4, 20161, Milano, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
De Vecchi Elena
2IRCCS Istituto Ortopedico Galeazzi, Orthopaedic Biotechnology Lab, Via Riccardo Galeazzi, 4, 20161, Milano, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Banfi Giuseppe
2IRCCS Istituto Ortopedico Galeazzi, Orthopaedic Biotechnology Lab, Via Riccardo Galeazzi, 4, 20161, Milano, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Locatelli Massimo
5Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Via Olgettina 60, 20132, Milan, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Carobene Anna
5Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Via Olgettina 60, 20132, Milan, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: Carobene.anna{at}hsr.it
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Background The rRT-PCR test, the current gold standard for the detection of coronavirus disease (COVID-19), presents with known shortcomings, such as long turnaround time, potential shortage of reagents, false-negative rates around 15–20%, and expensive equipment. The hematochemical values of routine blood exams could represent a faster and less expensive alternative.

Methods Three different training data set of hematochemical values from 1,624 patients (52% COVID-19 positive), admitted at San Raphael Hospital (OSR) from February to May 2020, were used for developing machine learning (ML) models: the complete OSR dataset (72 features: complete blood count (CBC), biochemical, coagulation, hemogasanalysis and CO-Oxymetry values, age, sex and specific symptoms at triage) and two sub-datasets (COVID-specific and CBC dataset, 32 and 21 features respectively). 58 cases (50% COVID-19 positive) from another hospital, and 54 negative patients collected in 2018 at OSR, were used for internal-external and external validation.

Results We developed five ML models: for the complete OSR dataset, the area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.83 to 0.90; for the COVID-specific dataset from 0.83 to 0.87; and for the CBC dataset from 0.74 to 0.86. The validations also achieved good results: respectively, AUC from 0.75 to 0.78; and specificity from 0.92 to 0.96.

Conclusions ML can be applied to blood tests as both an adjunct and alternative method to rRT-PCR for the fast and cost-effective identification of COVID-19-positive patients. This is especially useful in developing countries, or in countries facing an increase in contagions.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

No external funding was received

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study protocol (BIGDATA-COVID19) was approved by the Institutional Ethical Review Board (70/INT/2020) of IRCCS San Raffaele Scientific Institute, in agreement with the World Medical Association Declaration of Helsinki, on April 20th 2020, and authorized on April 22nd 2020.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

The datasets collected and used in this study are available from the corresponding author on reasonable request.

  • List of abbreviations

    ALP
    Alkaline Phosphatase
    ALT
    Alanine Aminotransferase
    ANGPOC
    Anion gap
    AST
    Aspartate Aminotransferase
    AUC
    Area under the curve
    BA
    Basophils count (%)
    BAT
    Basophils count
    BEEPOC
    Actual Base Excess
    BEPOC
    Base Excess
    BICPOC
    Bicarbonates
    BILD
    Direct Bilirubin
    BILIN
    Indirect Bilirubin
    BILT
    Total Bilirubin
    BISPOC
    Standard Calculated Bicarbonates
    BO2POC
    Bound O2 Maximum Concentration
    CA
    Calcium
    CAPOC
    Ionized Calcium (POC)
    CASPOC
    Standard Ionized Calcium(POC)
    CBC
    complete blood count
    CK
    Creatine kinase
    CLPOC
    Chloride (POC)
    CO2POC
    Carbonic Anhydride (pCO2)
    CREA
    Creatinine
    CRP
    C-reactive Protein
    CT
    computed tomography
    CTOPOC
    Total Oxygen
    ED
    emergency department
    EO
    Eosinophils count (%)
    EOT
    Eosinophils count
    FCOPOC
    Carboxyhemoglobin
    FG
    Fibrinogen
    FIOPOC
    Inspired Oxygen Fraction
    FO2POC
    Oxyhemoglobin / Total Hemoglobin
    GGT
    Gamma Glutamyltransferase
    GLU
    Glucose
    GLUEM
    Glucose Blood Gas
    O
    HCT
    Hematocrit
    HCTPOC
    Hematocrit (POC)
    HGB
    Hemoglobin
    HHBPOC
    Deoxyhemoglobin
    IL6
    Interleukin 6
    IOG
    Istituto Ortopedico Galeazzi
    K
    Potassium
    KNN
    k-nearest neighbors
    KPOC
    Potassium (POC)
    LATPOC
    Lactate (POC)
    LDH
    Lactate Dehydrogenase
    LR
    logistic regression
    LY
    Lymphocytes count (%)
    LYT
    Lymphocytes count
    MCH
    Mean Corpuscolar Hemoglobin
    MCHC
    Mean Corpuscolar Hemoglobin Concentration
    MCV
    Average Globular Volume
    METPOC
    Methemoglobin
    ML
    machine learning
    MO
    Monocytes count (%)
    MOT
    Monocytes count
    MPV
    Average Platelet Volume
    NA
    Sodium
    NAPOC
    Sodium (POC)
    NB
    Naive Bayes
    NE
    Neutrophils count (%)
    NET
    Neutrophils count
    NPV
    Negative predictive Value
    OFIPOC
    Inspired O2 / O2 ratio
    OSR
    Ospedale San Raffaele
    PCR
    Polymerase Chain Reaction
    PHPOC
    pH
    PLT
    Platelets PO2POC Oxygen (pO2)
    PPTR
    Activated partial thromboplastin time (R)
    PPV
    positive predictive value
    PROBNP
    NT-proB-type Natriuretic Peptide
    PTINR
    Prothrombin Time (INR)
    rRT-PCR
    reverse transcription polymerase chain reaction
    RBC
    Red Blood Cells
    RDW
    Erythrocyte distribution width
    RF
    random forest
    ROC
    receiver operating characteristic
    RT-PCR
    reverse transcriptase–PCR
    SO2POC
    O2 Saturation
    SVM
    support vector machine
    THBPOC
    Total Oxyhemoglobin
    TROPOT
    Troponin T
    UREA
    Urea
    WBC
    White blood cells
    XDP
    D-Dimer
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
    Back to top
    PreviousNext
    Posted October 04, 2020.
    Download PDF
    Data/Code
    Email

    Thank you for your interest in spreading the word about medRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests
    (Your Name) has forwarded a page to you from medRxiv
    (Your Name) thought you would like to see this page from the medRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests
    Cabitza Federico, Campagner Andrea, Ferrari Davide, Di Resta Chiara, Ceriotti Daniele, Sabetta Eleonora, Colombini Alessandra, De Vecchi Elena, Banfi Giuseppe, Locatelli Massimo, Carobene Anna
    medRxiv 2020.10.02.20205070; doi: https://doi.org/10.1101/2020.10.02.20205070
    Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests
    Cabitza Federico, Campagner Andrea, Ferrari Davide, Di Resta Chiara, Ceriotti Daniele, Sabetta Eleonora, Colombini Alessandra, De Vecchi Elena, Banfi Giuseppe, Locatelli Massimo, Carobene Anna
    medRxiv 2020.10.02.20205070; doi: https://doi.org/10.1101/2020.10.02.20205070

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Infectious Diseases (except HIV/AIDS)
    Subject Areas
    All Articles
    • Addiction Medicine (349)
    • Allergy and Immunology (668)
    • Allergy and Immunology (668)
    • Anesthesia (181)
    • Cardiovascular Medicine (2648)
    • Dentistry and Oral Medicine (316)
    • Dermatology (223)
    • Emergency Medicine (399)
    • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
    • Epidemiology (12228)
    • Forensic Medicine (10)
    • Gastroenterology (759)
    • Genetic and Genomic Medicine (4103)
    • Geriatric Medicine (387)
    • Health Economics (680)
    • Health Informatics (2657)
    • Health Policy (1005)
    • Health Systems and Quality Improvement (985)
    • Hematology (363)
    • HIV/AIDS (851)
    • Infectious Diseases (except HIV/AIDS) (13695)
    • Intensive Care and Critical Care Medicine (797)
    • Medical Education (399)
    • Medical Ethics (109)
    • Nephrology (436)
    • Neurology (3882)
    • Nursing (209)
    • Nutrition (577)
    • Obstetrics and Gynecology (739)
    • Occupational and Environmental Health (695)
    • Oncology (2030)
    • Ophthalmology (585)
    • Orthopedics (240)
    • Otolaryngology (306)
    • Pain Medicine (250)
    • Palliative Medicine (75)
    • Pathology (473)
    • Pediatrics (1115)
    • Pharmacology and Therapeutics (466)
    • Primary Care Research (452)
    • Psychiatry and Clinical Psychology (3432)
    • Public and Global Health (6527)
    • Radiology and Imaging (1403)
    • Rehabilitation Medicine and Physical Therapy (814)
    • Respiratory Medicine (871)
    • Rheumatology (409)
    • Sexual and Reproductive Health (410)
    • Sports Medicine (342)
    • Surgery (448)
    • Toxicology (53)
    • Transplantation (185)
    • Urology (165)