Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

A Highly Accurate Ensemble Classifier for the Molecular Diagnosis of ASD at Ages 1 to 4 Years

View ORCID ProfileBokan Bao, View ORCID ProfileVahid H. Gazestani, View ORCID ProfileYaqiong Xiao, Raphael Kim, Austin W.T. Chiang, Srinivasa Nalabolu, Karen Pierce, View ORCID ProfileKimberly Robasky, View ORCID ProfileNathan E. Lewis, View ORCID ProfileEric Courchesne
doi: https://doi.org/10.1101/2021.07.08.21260225
Bokan Bao
1Autism Center of Excellence, Department of Neuroscience, University of California San Diego, La Jolla, CA, USA
2Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
3Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
4Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Bokan Bao
Vahid H. Gazestani
1Autism Center of Excellence, Department of Neuroscience, University of California San Diego, La Jolla, CA, USA
2Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Vahid H. Gazestani
Yaqiong Xiao
1Autism Center of Excellence, Department of Neuroscience, University of California San Diego, La Jolla, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yaqiong Xiao
Raphael Kim
5Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
6Renaissance Computing Institute, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Austin W.T. Chiang
1Autism Center of Excellence, Department of Neuroscience, University of California San Diego, La Jolla, CA, USA
2Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Srinivasa Nalabolu
1Autism Center of Excellence, Department of Neuroscience, University of California San Diego, La Jolla, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Karen Pierce
1Autism Center of Excellence, Department of Neuroscience, University of California San Diego, La Jolla, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kimberly Robasky
6Renaissance Computing Institute, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
7Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, United States
8School of Information and Library Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
9Carolina Health and Informatics Program, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kimberly Robasky
Nathan E. Lewis
2Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
3Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
4Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nathan E. Lewis
  • For correspondence: nlewisres{at}ucsd.edu ecourchesne1949{at}gmail.com
Eric Courchesne
1Autism Center of Excellence, Department of Neuroscience, University of California San Diego, La Jolla, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Eric Courchesne
  • For correspondence: nlewisres{at}ucsd.edu ecourchesne1949{at}gmail.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

ABSTRACT

Importance ASD diagnosis remains behavior-based and the median age of the first diagnosis remains unchanged at ∼52 months, which is nearly 5 years after its first trimester origin. Long delays between ASD’s prenatal onset and eventual diagnosis likely is a missed opportunity. However, accurate and clinically-translatable early-age diagnostic methods do not exist due to ASD genetic and clinical heterogeneity. There is a need for early-age diagnostic biomarkers of ASD that is robust against its heterogeneity.

Objective To develop a single blood-based molecular classifier that accurately diagnoses ASD at the age of first symptoms.

Design, Setting, and Participants N=264 ASD, typically developing (TD), and language delayed (LD) toddlers with their clinical, diagnostic, and leukocyte RNA data collected. Datasets included Discovery (n=175 ASD, TD subjects), Longitudinal (n=33 ASD, TD subjects), and Replication (n=89 ASD, TD, LD subjects). We developed an ensemble of ASD classifiers by testing 42,840 models composed of 3,570 feature selection sets and 12 classification methods. Models were trained on the Discovery dataset with 5-fold cross validation. Results were used to construct a Bayesian model averaging-based (BMA) ensemble classifier model that was tested in Discovery and Replication datasets. Data were collected from 2007 to 2012 and analyzed from August 2019 to April 2021.

Main Outcomes and Measures Primary outcomes were (1) comparisons of the performance of 42,840 classifier models in correctly identifying ASD vs TD and LD in Discovery and Replication datasets; and (2) performance of the ensemble model composed of 1,076 models and weighted by Bayesian model averaging technique.

Results Of 42,840 models trained in the Discovery dataset, 1,076 averaged AUC-ROC>0.8. These 1,076 models used 191 different feature routes and 2,764 gene features. Using weighted BMA of these features and routes, an ensemble classifier model was constructed which demonstrated excellent performance in Discovery and Replication datasets with ASD classification AUC-ROC scores of 84% to 88%. ASD classification accuracy was comparable against LD and TD subjects and in the Longitudinal dataset. ASD toddlers with ensemble scores above and below the ASD ensemble mean had similar diagnostic and psychometric scores, but those below the ASD ensemble mean had more prenatal risk events than TD toddlers. Ensemble features include genes with immune/inflammation, response to cytokines, transcriptional regulation, mitotic cell cycle, and PI3K-AKT, RAS, and Wnt signaling pathways.

Conclusions and Relevance An ensemble ASD molecular classifier has high and replicable accuracy across the spectrum of ASD clinical characteristics and across toddlers aged 1 to 4 years, which has potential for clinical translation.

Question Since ASD is genetically and clinical heterogeneous, can a single blood-based molecular classifier accurately diagnose ASD at the age of first symptoms?

Findings To address heterogeneity, we developed an ASD classifier method testing 42,840 models. An ensemble of 1,076 models using 191 different feature routes and 2,764 gene features, weighted by Bayesian model averaging, demonstrated excellent performance in Discovery and Replication datasets producing ASD classification with the area under the receiver operating characteristic curve (AUC-ROC) scores of 84% to 88%. Features include genes with immune/inflammation, response to cytokines, transcriptional regulation, mitotic cell cycle, and PI3K-AKT, RAS and Wnt signaling pathways.

Meaning An ensemble gene expression ASD classifier has high accuracy across the spectrum of ASD clinical characteristics and across toddlers aged 1 to 4 years.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported by NIMH grant no. R01-MH110558 (E.C., N.E.L.), NIMH grant no. R01-MH080134 (K.P.), NIMH grant no. R01-MH104446 (K.P.), an NFAR grant (K.P.), NIMH grant no. P50-MH081755 (E.C.), and generous funding from the Novo Nordisk Foundation through the Center for Biosustainability at the Technical University of Denmark (grant no. NNF10CC1016517 to N.E.L.).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

UC San Diego IRB Project #202115X Discovering Biomarkers, Causes and Treatment of ASD through Clinical and Biological Studies

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

All data have been archived at the NIMH NDA

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted July 09, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A Highly Accurate Ensemble Classifier for the Molecular Diagnosis of ASD at Ages 1 to 4 Years
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A Highly Accurate Ensemble Classifier for the Molecular Diagnosis of ASD at Ages 1 to 4 Years
Bokan Bao, Vahid H. Gazestani, Yaqiong Xiao, Raphael Kim, Austin W.T. Chiang, Srinivasa Nalabolu, Karen Pierce, Kimberly Robasky, Nathan E. Lewis, Eric Courchesne
medRxiv 2021.07.08.21260225; doi: https://doi.org/10.1101/2021.07.08.21260225
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
A Highly Accurate Ensemble Classifier for the Molecular Diagnosis of ASD at Ages 1 to 4 Years
Bokan Bao, Vahid H. Gazestani, Yaqiong Xiao, Raphael Kim, Austin W.T. Chiang, Srinivasa Nalabolu, Karen Pierce, Kimberly Robasky, Nathan E. Lewis, Eric Courchesne
medRxiv 2021.07.08.21260225; doi: https://doi.org/10.1101/2021.07.08.21260225

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Psychiatry and Clinical Psychology
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)