Abstract
Autism spectrum disorder (ASD) is a developmental disability that can cause significant social, communication and behavioral challenges. Many challenges remain for ASD diagnosis and treatments. Current diagnostic criteria are based on behavioral symptoms alone. Wait times for appointments at diagnostic centers range from 9 to 13 months, and a single diagnostic appointment can last several hours. There is an urgent need to identify ASD associated biomarkers and features to help automate diagnostics and develop predictive ASD models. The present study adopts a novel evolutionary algorithm, the conjunctive clause evolutionary algorithm (CCEA), to select features most significant for distinguishing individuals with and without ASD using a unique dataset having a small number of samples with a very large number of feature measurements. The dataset comprises both behavioral and neuroimaging measurements from a total of 21 children from 7 to 14 years old. Potential biomarker candidates including volume, area, cortical thickness and mean curvature in specific regions in the cingulate cortex, frontal cortex and temporal-parietal junction areas were identified. Behavioral features associated with theory of mind were selected. Study findings demonstrate how machine learning tools can advance ASD research in the genre of big data to benefit this special population in the future.
1 Introduction
Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder characterized by impairments in social interaction, and restricted and repetitive behaviors [1–4]. According to the Center for Disease Control and Prevention (CDC) 2018 report, the number of U.S. children diagnosed with ASD has increased from 1 in every 150 in year 2000 to about 1 in every 54 in year 2016 [5]. The economic burden of pediatric ASD is due substantially to the costs associated with an increased use of health services, school support, ASD-related therapy, family services, and caregiver time. Total societal costs in the United States for children with ASD were estimated at $11.5 billion in year 2011 [6, 7]. While genetic and environmental factors have been linked to the development of ASD; at present, there is no identified cause or cure for ASD.
Some symptoms of ASD are not evident until age two or later. In fact, a child may appear to be developing normally until the age of two when they stop learning new skills and may even forget old skills [8, 9]. Currently, the diagnosis of autism is based on behavioral symptoms alone. There are two common behavioral assessment tools guiding the diagnostic process, The Autism Diagnostic Observation Schedule-second edition (ADOS-2) and The Autism Diagnostic Interview-revised (ADI-R) [10, 11]. However, a typical diagnostic appointment consists of evaluations lasting several hours at a designated clinical office. Due to the rigorous and time-consuming nature of ASD diagnostic examinations, the demand exceeds the capacity to see patients. As a result, many diagnostic centers have expanding wait lists for appointments. This bottleneck translates to delays in diagnosis of 13 months and longer for minority and lower socio-economic status groups [2, 8, 12–18]. It is also believed that a substantial number of individuals on the spectrum remain undetected [19]. With growing awareness of ASD, there is a high demand for a faster and automated ASD diagnostic approach that might allow for more efficient diagnosis and early identification of high-risk populations [20].
Building an automated diagnostic and predictive model of ASD is timely as many studies have adopted machine learning approaches to identify significant biomarkers that include both behavioral and biological features. Duda and colleagues (2016) applied machine learning to distinguish ASD from attention deficit hyperactivity disorder (ADHD) using a 65-item Social Responsiveness Scale [21]. Bone et al. (2015) trained their models to diagnose autism against healthy controls using the same Social Responsiveness Scale and the Autism Diagnostic Interview-Revised score [22]. Other studies aggregated items from the ADOS and scores from the Autism Quotient (AQ) to accurately classify an ASD group. However, behavioral measures may be interpreted as subjective, and there can be a wide range of select features depending on which tests are used. Consequently, it becomes important to identify consistent markers associated with ASD.
As a result of the wide range in ASD behavioral measures and their subjective nature, many studies have searched for brain-based biological markers to identify a common etiology across individuals with ASD. These brain-based biological markers are less subjective than behavioral measures and may represent potential targets for treatments. Currently, markers that are measurable via magnetic resonance imaging (MRI) are highly desirable because they may represent potential targets for both treatments and diagnostic tools [23]. Independent structural MRI studies have found differences in whole brain volume and the developmental trajectories between individuals with ASD and those without ASD [24–33]. Other structural brain abnormalities associated with ASD include cortical folding signatures appearing in the following regions of the brain: temporal-parietal junction, anterior insula, posterior cingulate, lateral and medial prefrontal, corpus callosum, intra-parietal sulcus, and the occipital cortex [24–33]. Evidence also shows that an accelerated expansion of cortical surface area, but not cortical thickness, causes an early overgrowth of the brain in children with ASD [34], while other studies suggest that individuals with ASD tend to have thinner cortices and reduced surface area as an effect of aging [35].
Machine learning (ML) has been introduced to the neuroimaging field to identify the abnormal brain regions in individuals with ASD. The support vector machine (SVM) is an algorithm that avoids over-fitting and is capable of high classification accuracy without requiring large sample sizes. The SVM algorithm is able to classify ASD from corresponding controls using extracted features from functional connections and grey matter volume [36–40]. Other ML classifications of ASD include deep neural networks [41] and the random forests (RF) algorithm; the latter uses random ensembles of independently grown decision trees [42]. Although these methods have demonstrated high accuracy for classifying ASD, they have not identified precise neuroimaging-based biomarkers and features associated with ASD. The majority of studies have adopted data from the Autism Brain Imaging Data Exchange (ABIDE) dataset that includes 1112 existing resting-state functional magnetic resonance (rs-fMRI) imaging datasets with corresponding structural MRI and phenotypic information from 539 individuals with ASD and 573 age-matched typical controls collected from 24 international brain imaging laboratories [43].
Classification across a heterogeneous population is challenging [44, 45], particularly when neuroimaging data are pooled from multiple acquisition sites (e.g., the ABIDE dataset, which has considerable variation in demographic and phenotypic profiles). Variances introduced in the data due to scanner hardware, imaging protocols, operator characteristics, regional demographics, and other factors that are acquisition site-specific, can affect the classification performance. This problem is especially relevant for ASD given the inherent heterogeneity of the population. It is often difficult to collect neuroimaging data from individuals with autism given the loudness of the scanner and difficulties remaining still. In fact, most individual site datasets have small sample sizes that can lead to over-fitting and classification inaccuracies. Moreover, while many traditional ML algorithms were designed to classify large amounts of data (e.g., ABIDE) rather than optimize the selection of features, the ultimate goal for ML-based diagnostic classification in neuroimaging is to identify discriminative features to provide insight into abnormal structure and dysfunctional connectivity patterns in the affected population [46].
The present study employed an evolutionary algorithm, the conjunctive clause evolutionary algorithm (CCEA), to select features most correlated with classifying individuals with and without ASD. The dataset has a relatively large number of features (both behavioral and neuroimaging measurements) from a total of 26 children, which comprises a training set (7 children with ASD and 14 neurotypical (NT) children), and a testing set (1 child with ASD and 4 NT). Children in the testing set were enrolled as a cohort at a later time after the CCEA was trained.
The neuroimaging measurements included brain volume, brain surface area, cortical thickness, and cortical curvature extracted from MRI whole brain T1 weighted scans. Behavioral measurements included scores from the Comprehensive Assessment of Spoken Language (CASL) [47], the Universal Nonverbal Intelligence Test-2 (UNIT-2) [48, 49], the Theory of Mind Task Battery (ToMTB) [50] and the Theory of Mind Inventory-2 (ToMI-2) [50]. The present study examined the validity of using the CCEA algorithm for feature selection in ASD, particularly to address challenges associated with traditional ML algorithms when working with small datasets. It aims to identify discriminative biomarkers and behavioral features to help develop an automatic diagnostic and predictive system for ASD.
Although some ML-based methods have been applied to ASD, the suitability of machine learning and the choice of algorithms with regard to the specific behavior examined, as well as the quality and quantity of the data obtained from individual studies, needs further investigation [51]. We believe this study is the first study to:
Classify ASD and select discriminative biomarkers among children from 7 to 14 years old.
Include both behavioral and biological measurements in the feature selection model.
Identify models (sets of features) that most strongly correlate to children with ASD given a dataset with a relatively small sample size (i.e., N=26) and large number of features (i.e., 247 neuroimaging features and 14 behavioral features).
2 Methods
2.1 Participants
A total of 8 children with ASD (1 female, mean age = 11) and 18 NT children (7 female, mean age = 10.28) were enrolled in the study. All children participated in 2-3 hours of baseline behavioral assessments in which both groups completed the CASL, UNIT-2, ToMTB, and ToMI-2,while the ASD group also completed the ADOS-2 and the Social Communication Questionnaire-Lifetime version (SCQ) [52] to confirm their ASD diagnosis. All ASD children demonstrated understanding of the instructions given in the behavioral and functional magnetic resonance imaging (fMRI) tasks.
2.2 Behavioral Measurements
The CASL is an orally administered research-based assessment consisting of 15 subtests measuring language for individuals ranging from 3 to 21 years of age. For the present study, only those basic subsets that establish the CASL language core are used: Antonyms, Sentence Completion, Syntax Construction, Paragraph Comprehension, and Pragmatic Judgment. The UNIT-2 is a multidimensional assessment of intelligence for individuals with speech, language, or hearing impairments. It consists of nonverbal tasks that test symbolic memory, non-symbolic quantity, analogic reasoning, spatial memory, numerical series, and cube design.
The ToMTB and ToMI-2 are two norm-referenced tools and behavioral tasks used as outcome measures to assess theory of mind (ToM) [53, 54]. ToM is the ability to reason about the thoughts and feelings of self and others, including the ability to predict what others will do or how they will feel in a given situation on the basis of their inferred beliefs. ToM is a core social deficit in ASD. Scores from both ToMTB and ToMI-2 provide valid representations of a child’s social cognition level. The ToMI-2 is a parent-informant measure of a child’s functional level of ToM. Each of the 60 items assesses a particular ToM dimension using items that range from simple content to those that evaluate more complex skills. Each item is rated on a 20-unit continuous scale anchored by “Definitely Not” and “Definitely.” Respondents indicate their response with a vertical hash mark at the point on the scale that best reflects their attitude. Item, subscale, and composite scores range from 0-20 with higher values reflecting greater parental confidence that the child possesses a particular ToM skill. The ToMI-2 is designed to be a socially and ecologically valid index of ToM as it occurs in everyday social interactions. It has demonstrated excellent test-retest reliability, internal consistency, and criterion-related validity for both NT and ASD children as well as contrasting-group validity and statistical evidence of construct validity (i.e., factor analysis). The ToMTB directly assesses a child’s understanding of a series of scenarios tapping theory of mind. It consists of 15 test questions within nine tasks, arranged in ascending difficulty. Tasks are presented as short vignettes that appear in a story-book format. Each page has color illustrations and accompanying text. For all tasks, children are presented with one correct response option and three plausible distracters. Memory control questions are included that must be passed for credit on the test questions. The ToMTB has strong test-retest reliability [55–57].
We included the total score of the CASL, full scale score of the UNIT-2, abbreviated score of the UNIT-2, total score of the ToMTB, total composite mean of the ToMI-2 (i.e., assessing overall ToM ability), early subscale mean of the ToMI-2 (i.e., assessing early developing ToMI ability such as regulating desire-based emotion and recognition of happy and sad), basic subscale mean of the ToMI-2 (i.e., assessing basic ToM ability such as recognition of surprise), advanced subscale mean of the ToMI-2 (assessing advanced ToM ability such as recognition of embarrassment) in the CCEA algorithm for feature selection. We also included scores from single items assessing recognition of simple emotions such as happy and sad in the model, as well as more complex emotions such as surprise and embarrassment, which ASD children often find difficult to recognize and process [58–62]. There were 13 behavioral features in total. Table 1 provides demographic information for all participants (i.e., N represents NT subject and A represents ASD subject) including their age, gender and scores on the 13 behavioral tests. Results from T-tests found significant differences (i.e. p<0.05) between the NT group and the ASD group on scores of CASL, UNIT-2 full scale, ToMTB total, ToMI-2 total, ToMI-2 early subscale, ToMI-2 basic subscale, ToMI-2 advanced subscale, ToMI-2 embarrassment and ToMI-2 desire based, where NT subjects scored higher.
2.3 MRI Acquisition and Preprocessing
All data were acquired using the MRI Center for Biomedical Imaging 3T Philips Achieva dStream scanner and 32-channel head coil at the University of Vermont (UVM). Parameters for T1 acquisition are TR 800ms, TE 30ms, flip angle 52 degree, 2.4mm isotropic imaging resolution with a 216 × 216 × 144mm3 field of view using a multiband acceleration factor of 6 (60 slices, no gap). Participants watched three videos at home before coming to the MRI center. The first was a cartoon video explaining what an MRI is, and what one might experience while laying in an MRI scanner [63]. The second video, recorded at the UVM MRI mock scanner room, helped visualize the real setting and procedures a child would experience. The third video explained the procedures of wearing earplugs. All participants practiced laying still and became familiar with the scanner noise in the mock scanner room. The T1 structural scan was preprocessed using the Human Connectome Project (HCP) minimal preprocessing pipelines, including spatial artifact/distortion removal, surface generation, cross-modal registration, and alignment to standard space. These pipelines are specially designed to capitalize on the high quality data offered by the HCP. The final standard space makes use of a recently introduced CIFTI file format and the associated grayordinates spatial coordinate system. This allows for combined cortical surface and subcortical volume analyses while reducing the storage and processing requirements for high spatial and temporal resolution data [64]. Brain anatomical features are extracted using FreeSurfer aparcstats2tabl script [65], including volume, cortical thickness, mean curvature, and area of all ROIs for each subject. These ROIs are defined using the automatic segmentation procedures that assign one of 37 labels to each brain voxel, including left and right caudate, putamen, pallidum, thalamus, lateral ventricles, hippocampus, and amygdala [66]. There are 276 brain features included in total.
2.4 Conjunctive Clause Evolutionary Algorithm
We used a novel evolutionary algorithm to identify the features associated with ASD. The CCEA is a machine learning tool that searches for both the combinations of features associated with a given category (e.g., ASD) as well as their corresponding range of values [67]. The CCEA is capable of finding feature interactions even in the absence of main-effects, and can, therefore, find feature combinations that would be difficult to discover using traditional statistics. The CCEA selects for the best conjunctive clauses (CC) of the form:
where Fi represents a risk factor i whose value lies in the range ai; and the symbol ∧ represents a conjunction (i.e., logical AND). The benefit of the CCEA is that it produces parsimonious models that are correlated with a select category (e.g., ASD). The model parsimony is measured using the order of the conjunctive clause, which is the total number of features in the conjunctive clause. One example of a parsimonious 2nd-order conjunctive clause is: a person with a right hemisphere isthmus cingulate volume of 3,300 – 4,100 AND a right hemisphere posterior cingluate volume of 4,100 – 6,200 is more likely to have ASD than someone who does not meet these criteria.
The fitness of each conjunctive clause (CC) is evaluated using the hypergeometric probability mass function (PMF) and only the most-fit conjunctive clauses are saved. The hypergeometric PMF is not a p-value and thus, is not constrained by issues associated with what threshold is “significant” [68–70]. To prevent overfitting, the CCEA performs feature sensitivity on each conjunctive clause to ensure each feature contributes to the overall fitness. For each feature in a conjunctive clause, the sensitivity is calculated by taking the difference between the conjunctive clause fitness and the fitness when that feature is removed. Thus, a feature’s sensitivity may be viewed as the amount of fitness that it contributes to the conjunctive clause. To visualize the fitness landscape, both positive predictive value and coverage are calculated. Positive predictive value (PPV) is the number of true positives divided by the sum of true and false positives; and class coverage is the number of true positives divided by the sum of true positives and false negatives (i.e., the percent of ASD individuals that match the conjunctive clause). In this work, the CCEA was run five times using the training set to ensure a more thorough search of the fitness landscape.
3 Results
3.1 Training Set: 7 ASD and 14 NT
In the training set, 2438 CCs (i.e., models) were generated ranging from first-order to fifth-order. The PPV of the 2438 models range anywhere from 46.47% to 100% and their class coverage ranges from 42.86% to 100%. Among these models, we looked for the most parsimonious (i.e., lower order models) to draw meaningful conclusions and to avoid overfitting, which exist with higher-order models. As a result, we selected 8 second-order models (i.e., those having only two features) and the highest fitness (PMF) among the total 520 second-order models. These 8 best performing models have 100% PPV and 100% class coverage. All of the features identified have only brain anatomical features (Table 2). Because of our desire to examine the behavioral features, we expanded our analysis to include third-order models (i.e., model combinations with three features). There were 651 third-order models in total; some consisted only of anatomical brain features, while others had two behavioral features plus one brain anatomical feature. We selected the six best performing third-order models with the highest fitness (PMF); each had 100% PPV and 100% class coverage. Each of these third-order models (Table 3) contained two behavioral features and one brain anatomical feature.
Selected second-order models have 100% PPV and 100% class coverage. These models, using CC 113 (Table 2) as an example, would be interpreted as - any subjects whose posterior cingulate gyrus volume was within the range of 3500 to 4600 mm3 AND left rostral middle frontal gyrus volume was within the range of 20,000 to 25,000 mm3 would be classified as having ASD. The volume of the left hemisphere posterior cingulate gyrus and the volume of the right hemisphere isthmus of the cingulate gyrus were the two features to appear most frequently (i.e., four times) across all models, suggesting that the volume of cingulate gyrus is a potentially important biomarker for ASD. Figure 1 provides a 2D visualization for the range of feature values (numerical boundaries) associated with these models and the placement of each subject within this range. Note: Green dots represent ASD subjects and group together within the rectangle defining the range of values in Table 2.
Selected third-order models have 100% PPV and 100% class coverage. Using CC 46 (Table 3) as an example, any subjects who had a total score on ToMTB within the range of 5 to 13 AND an early subscale mean score on ToMI-2 within the range of 12 to 18 AND a mean curvature value of the left hemisphere pars orbitalis within the range of 0.17 to 2 would fall in the ASD class. The ToMTB total score feature occurred in all of our best fit third-order models; and the ToMI-2 early subscale mean score occurred in all but one (CC 1163) of the models, where the ToMI-2 total composite mean played a role. Such a finding further suggests that ToMTB and ToMI-2 might be effective for ASD testing and diagnosis tools. Figure 2 is a 3D visualization of the CC feature value boundaries and classification placement of each subject, where green dots represent ASD subjects and group together within the pink cube defined by feature values in Table 3.
3.2 Testing Set: 1 ASD and 4 NT
A cohort of new subjects comprising 1 ASD child and 4 NT children were recruited separately at a later time. This later cohort served as a testing set to validate the original 2438 models generated in the training set. Three third-order models (Table 4) and four fourth-order models (Table 5) were selected from the total 2438 models generated using the training set, as these 7 models were the only models that remained 100% PPV and 100% class coverage on the testing set. Among the important features selected for the testing set, cortical thickness of the left hemisphere pericalcarine cortex and mean curvature of the right hemisphere pars orbitalis were the two features to appear most frequently across all models, indicating the important roles of these areas in ASD. The inclusion of this testing provided an additional observations to examine the robustness of the best-fit models identified using the original training set (i.e., parsimonious models with 100% PPV and 100% class coverage).
4 Discussion
The present study successfully classified and selected discriminative biomarkers and behavioral features in ASD children from 7 to 14 years old using a small dataset collected from a single research site. Machine learning (ML) tools have long been introduced to ASD research; but it still remains a far-reaching goal to build an ultimate prediction model and diagnostic system for ASD. With some progress, other studies often face the problem of using datasets across different research sites for classification purpose, rather than identifying discriminative features for diagnostic and treatment development purposes [21, 22, 36–43, 71, 72]. Additionally, traditional ML algorithms do not work well with ASD datasets due to high variances and the heterogeneous nature of the disease [44, 45]. Meanwhile, it requires a tremendous amount of effort to include ASD individuals in a research study given the social and language challenges of such a population. Thus, nearly all ASD datasets have a large number of features with small sample size, which despite being inappropriate for many ML algorithms, often leads to overfitting and poor classification accuracy. The present study, however, has demonstrated exceptionally good performance (i.e., 100% PPV and 100% class coverage) of the CCEA with a dataset containing a large number of features yet small sample size; and in this case, identifies features significantly associated with ASD.
The select CCEA features included volume, area, cortical thickness and mean curvature in specific regions in the cingulate cortex, frontal cortex and temporal-parietal junction areas as biomarkers for ASD (e.g., the pericalcarine cortex, posterior cingulate cortex, isthmus of the cingulate gyrus, pars orbitalis, etc.). Such findings are consistent with previous literature suggesting that individuals with ASD have abnormalities in these brain regions [24–33, 73–75]. Additionally, third-order models from the training set include measurements from the ToMI-2 and the ToMTB as significant features [55–57], which has further validated the use of these tools in ASD assessments. Such findings can potentially help clinicians and researchers address specific domains in ToM to improve the social skills of children with ASD. Besides the select 8 second-order and 6 third-order models from the training set, it is impressive that the CCEA is also able to validate 7 additional best performing third- and fourth-order models on the testing set. Although it would have been ideal if the 8 second-order and 6 third-order models identified on the training set had perfectly modeled the testing set, the fact that an additional 7 or the original models classify, with 100% PPV and 100% class coverage, on the testing set further demonstrates the robustness of the original models generated using the training set, as well as the exceptional performance of the CCEA algorithm in this study.
5 Limitations and Future Directions
ASD research often struggles with balancing between issues of either having a large sample size but high heterogeneity between subjects, or the other way around. Additionally, most currently available ML algorithms are designed for solving classification problems using big data. Although the present study has limitations given the relatively small sample size, it is impressive that the CCEA algorithm is able to identify features with exceptional classification performance and biomarker identification using a small dataset containing a large number of features. With the strong performance of the CCEA for the current sample size, the generation of more comprehensive results would be possible given a larger sample size.
The present study has established important biomarker candidates of ASD. These biomarker candidates fall into the same brain regions that have been identified to show abnormalities in ASD from studies adopting traditional neuroimaging measurements. Thus, it provides evidence that AI methodologies can perform as well as the traditional approaches in the field of neuroscience and ASD in selecting neuroanatomical biomarkers. Although AI techniques have been adopted to help with diagnosis and treatment development in medicine, ASD is exceptionally challenging given its great heterogeneity nature. It will require a large, diverse, and comprehensive dataset to extract solid biomarkers, which can be very time-consuming and less accurate using traditional approaches. Under such circumstances, ML techniques can help advance the development of an automatic diagnostic and predictive system for ASD. In summary, the present study has provided a new direction in adopting AI techniques in ASD research and medicine in general.
Data Availability
The data in this manuscript was collected by the research team under Dr. Patricia Prelock, access can be granted upon request.
Acknowledgements
This project was supported by a private donor committed to advancing research in autism spectrum disorder. We thank Jay V. Gonyea, Administrative Director, and Scott Hipko, Senior Research Technologist, in the MRI Research Unit at the University of Vermont, for their support in acquiring the MRI scans. We thank Dr. Richard Watts, Ph.D., Director at the FAS Brain Imaging Center, and Dr. Joeseph Orr, Ph.D., Assistant Professor at the Texas A&M University, for sharing their knowledge in MRI data pre-processing.
Footnotes
↵* yu.han{at}uvm.edu
References
- [1].↵
- [2].
- [3].
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].
- [13].
- [14].
- [15].
- [16].
- [17].
- [18].
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].
- [26].
- [27].
- [28].
- [29].
- [30].
- [31].
- [32].
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].
- [38].
- [39].
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].
- [57].↵
- [58].↵
- [59].
- [60].
- [61].
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].↵
- [67].↵
- [68].↵
- [69].
- [70].↵
- [71].
- [72].
- [73].
- [74].
- [75].