Determining the characteristics of genetic disorders that predict inclusion in newborn genomic sequencing programs ================================================================================================================== * Thomas Minten * Nina B. Gold * Sarah Bick * Sophia Adelson * Nils Gehlenborg * Laura M. Amendola * François Boemer * Alison J. Coffey * Nicolas Encina * Bianca E. Russell * Laurent Servais * Kristen L. Sund * Petros Tsipouras * David Bick * Ryan J. Taft * Robert C. Green * the ICoNS Gene List Subcommittee ## Abstract Over 30 international research studies and commercial laboratories are exploring the use of genomic sequencing to screen apparently healthy newborns for genetic disorders. These programs have individualized processes for determining which genes and genetic disorders are queried and reported in newborns. We compared lists of genes from 26 research and commercial newborn screening programs and found substantial heterogeneity among the genes included. A total of 1,750 genes were included in at least one newborn genome sequencing program, but only 74 genes were included on >80% of gene lists, 16 of which are not associated with conditions on the Recommended Uniform Screening Panel. We used a linear regression model to explore factors related to the inclusion of individual genes across programs, finding that a high evidence base as well as treatment efficacy were two of the most important factors for inclusion. We applied a machine learning model to predict how suitable a gene is for newborn sequencing. As knowledge about and treatments for genetic disorders expand, this model provides a dynamic tool to reassess genes for newborn screening implementation. This study highlights the complex landscape of gene list curation among genomic newborn screening programs and proposes an empirical path forward for determining the genes and disorders of highest priority for newborn screening programs. ## Introduction At least 34 groups worldwide are exploring genomic sequencing as a way to expand newborn screening and identify children at high risk for hundreds of actionable genetic disorders.1 Stakeholders, including diverse groups of parents,2,3 rare disease specialists,4 primary care physicians,5 genetic counselors,4,6 and the public7 support the implementation of genomic newborn screening for at least some genetic disorders. However, many clinical, ethical, and technological barriers to implementation remain.1,8–10 One ongoing challenge is determining which specific genes should be queried and which variants should be reported in apparently healthy infants. The criteria for disease screening developed by Wilson and Jungner11 in the 1960s has historically provided a framework for the disorders included in public newborn screening programs. These criteria emphasize the inclusion of childhood-onset disorders that are treatable if diagnosed in their early stages. However, current newborn screening practices have recently begun to depart from this paradigm. For example, the additions of infantile Pompe disease and X-linked adrenoleukodystrophy to the Recommended Universal Screening Panel (RUSP) in the United States in 2015 and 2016 have led to the identification of conditions, such as late-onset Pompe disease or adrenomyeloneuropathy, that do not develop until adulthood or have attenuated forms.12,13 As such, there is now a precedent to consider a broader number of disorders for inclusion in newborn screening, including those that do not develop in the newborn period. Over 700 genetic disorders that present throughout the lifespan now have targeted treatments,4,14 and consensus guidelines for long-term surveillance and management have been developed for many others. Early diagnosis of these conditions has the potential to improve the lifelong health of infants who receive positive genomic screening results. At present, each research and commercial newborn screening program uses an independent process to design the list of genes that they analyze. Determining which genes, and associated disorders, have high concordance across research studies and commercial products related to genomic newborn screening could support the development of a list of high concordance genes that could be used for pilot screening by public newborn screening laboratories. Prior studies have identified discrepancies across four gene lists curated by commercial genomic newborn screening products15, as well as five research studies,16 but little is known about the concordance of gene lists used by other programs. Here we compare the content of gene lists from 19 research studies and seven commercial genomic newborn screening programs, using a linear regression model to determine the gene and disease characteristics that are most strongly associated with inclusion among multiple lists.4,17–19 We then use a machine learning random forest model to identify characteristics of genes that predict their inclusion across studies, which can serve as a dynamic resource for determining the acceptability of additional genes to be included in genomic newborn screening in the future. ## Materials and methods ### Identification of lists of genes from research studies and commercial products Research studies and commercial laboratory gene panels related to genomic newborn screening were identified based on inclusion in the International Consortium on Newborn Sequencing (ICoNS) and through an online search using terms related to genomic sequencing of newborns. In total, 34 programs were identified, of which 26 provided gene lists (Table 1, Figure S1). Only genomic newborn screening programs with independently constructed gene lists that include >100 genes were included in our analysis. View this table: [Table 1.](http://medrxiv.org/content/early/2024/04/05/2024.03.24.24304797/T1) Table 1. Research and commercial genomic newborn screening programs. Gene lists from 26 of these programs were included in the analysis (denoted with an asterisk). We included gene lists from 19 research studies: BabyDetect20, BabyScreen+16, BabySeq217, BeginNGS18,21, Chen et al. 202322, Early Check23, FirstSteps, the Generation study, gnSTAR24,25, GUARDIAN study, Jian et al. 202226, Lee et al. 201927, Luo et al. 202028, NeoExome29, NeoSeq30, NESTS31, NewbornsInSA, Progetto Genoma Puglia, and Wang et al. 202332. In two studies (GUARDIAN study and Early Check23), participants receive testing for a gene list focused on conditions with effective treatments and have the option to be tested for an expanded gene list. For both of these studies, we included only the core gene list focused on treatable genetic conditions. A total of seven lists of genes from commercial firms that offer products related to genomic newborn screening were included: FORESITE 360, Fulgent, Igenomix, Mendelics, Nurture Genomics, PerkinElmer33, Sema4.20 Of note, the Sema4 product is no longer commercially available. For several gene lists that were included, information on inclusion criteria was obtained from publicly available documents (Table S1). ### Merging of gene lists Gene names were converted to the current nomenclature set forth by the HUGO Gene Nomenclature Committee (HGNC) based on an available online multi-symbol checker.34 For purposes of analysis, each gene was linked to one condition. The multistep process for linking genes to a single disorder began by first identifying the disease names associated with each gene on Online Inheritance in Man (OMIM).35 If only one disease name was associated with the gene on OMIM, a gene-disease pair was formed. If the gene was known to be associated with multiple OMIM disorders, we used the ClinGen Gene-Validity Resource to select only the disorder with definitive validity when available.36 For genes with more than one disorder with definitive validity or for genes without any disorder with definite validity, one disorder was selected based on the highest number of programs in this study that indicated it as screening target. For example, for *RYR1*, which has a definitive association with both susceptibility to malignant hyperthermia and myopathy,36 susceptibility to malignant hyperthermia was selected as the target disorder. Susceptibility to malignant hyperthermia was indicated as a target disease by five of seven programs with disease information available screening for this gene, compared with myopathy which was listed as a target disease by only two of seven programs. Each gene and associated condition was also assigned to a clinical area, such as neurology or cardiovascular disease, by a medical geneticist (S.B., N.G.). A total of 42 genes were not associated with any disorders on OMIM or ClinGen, possibly because they were candidate genes or had very recently been substantiated as disease genes. For genes not associated with a disorder, the disorder name with the highest number of programs indicating it as a screening target was assigned. Two gene names, *GTM* and *CD1*, which could not be linked to HGNC approved gene names, were omitted from the analysis. ### Characteristics of genes and associated disorders In order to determine which genes were associated with disorders on the Recommended Uniform Screening Panel (RUSP), genes derived from a list published by Owen et al.21 were cross-referenced with the disease entities listed in the RUSP section of the United States Health Resources and Services Administration (HRSA) website ([https://www.hrsa.gov/advisory-committees/heritable-disorders/rusp](https://www.hrsa.gov/advisory-committees/heritable-disorders/rusp)). A gene was determined to be associated with a specific disease on the RUSP if it was listed under the “Cause” section of the pertinent disease-specific HRSA webpage. Using information from previously published studies, most genes were then matched with metrics related to a range of characteristics including: disease penetrance,17,19,37 disease severity,19,37 acceptability and efficacy of treatment,19,37 age of onset,19,37 evidence base (which refers to the level of knowledge about the natural history of the condition and treatments),17,19,37 prevalence,4 and existence of an orthogonal diagnostic test (such as a biochemical biomarker that can be found in blood).4 Similarly, information about the inheritance pattern were drawn from previously published studies.17,19,37 When two modes of inheritance were implicated for the same gene and disease in these studies, such as for *MYO6*, a cause of non-syndromic deafness, dominant inheritance was selected as it was expected to lead to the most inclusive reporting criteria. Additionally, the age-based semi-quantitative metric (ASQM) score,19,37 a previously published metric which assigns a number between 0 and 15 to a gene-disease pair to denote overall appropriateness for newborn screening, was applied to genes for which it was available. For a majority of genes, the category in the BabySeq study17 was also encoded. BabySeq Category A is designated as the category of genes most amenable to newborn screening, while Category B is considered less amenable to newborn screening. Finally, the proportion of 238 rare disease experts that recommend screening for a gene in an online survey was included for 649 of the genes.4 A description of all metrics can be found in Table S2. Each of these metrics was available only for a subset of genes included in this study (Figure S2). ### Statistical analysis Descriptive statistics for each gene list, including the length of the list, proportion of genes in each clinical category, and the number of genes associated with RUSP conditions were calculated. The distribution of genes across BabySeq categories, average ASQM, and percentage of survey respondents recommending inclusion for the gene in Gold et al (2023)4 were calculated within each study for all genes for which these metrics were available. To provide information on the concordance across all lists of genes, Jaccard similarity indices were calculated. This index measures the number of genes in the intersection set divided by the number of genes in the union set of two gene lists. A linear regression model was used to identify factors associated with inclusion in multiple gene lists. Two types of regressions were performed: (1) regressions in which the outcome variable is the proportion of gene list inclusion across all genomic newborn screening programs and the independent variables are factors related to each gene and its associated condition (Table S4, S6), and (2) regressions in which inclusion of a gene for each *individual* study was explored (Table S5, S7-S9). ### Prediction analysis A machine learning algorithm was developed to predict the number of gene lists in which each gene was included. We used the observed overall proportion of the gene lists in which each gene was included as the outcome measure. Predictors were nearly identical to those used in the linear regression model, but the inheritance pattern was excluded as it is unlikely to be a criterion by which newborn screening programs select diseases and genes to assess. Missing variables were handled by adding dummies in the regression model and setting the variable to zero, and by setting the value to a negative value in the random forest model. Model parameters used in the random forest algorithm were number of trees (100), variables at each node (four), and limits to tree length (none). We tuned one parameter (variables at each node) using the Caret package using 3-fold cross validation in R version 4.3.1 (Vienna, Austria). An ensemble machine learning algorithm was constructed. The final prediction is the average of the predictions of a linear regression model and a random forest prediction algorithm, similar to standard methods in machine learning for numerical data.38 To measure the out-of-sample R-squared of the prediction, we trained the algorithm on 70% of the sample, and measured its accuracy on the hold-out set of 30% of genes. The algorithm was then trained on the entire sample, and variable importance measures and the predictions were obtained in Stata 18 (College Station, TX). ## Results ### Description of genomic newborn screening research programs and commercial products We identified 34 research and commercial genomic newborn screening programs (Table 1). Of these, 11 are located in North America, 10 in Asia, eight in Europe, four in Oceania and one in South America. A total of 12 of these research studies have resulted in a publication. A total of 11 studies are ongoing and 11 are scheduled to begin. Across the research studies, the number of enrolled or intended participants varies from 48 to 100,000, with the combined total intended sample size being 399,910. The most frequently used methods to sequence DNA in the genomic newborn screening programs are panel sequencing (17 programs) and whole genome sequencing (13 programs). We did not record which programs used proband-only genetic testing, versus dyad or trio sequencing with parental samples. In the following analyses of this paper, we included lists of genes that were publicly available or provided by study authors, which included 19 research programs and seven commercial programs. Information on several of the studies’ inclusion criteria, obtained from publicly available documents, is listed in Table S1. Nearly all studies indicated the intent to include early-onset, severe, treatable conditions in their associated publications or websites. ### List of gene characteristics In total, 1,750 genes were included in at least one of the 26 gene lists (Table S3). The number of genes included in each list ranged from 134 to 889 (median = 284) (Figure 1). Commercial panels included fewer genes (median 268 vs. 385 for research studies). Cumulatively, genes linked to inherited metabolic (38.1%), immunological (13.0%), endocrinologic (12.0%) and neurological (8.1%) disorders constituted the largest share across the gene lists. Dermatologic (0.3%), orthopedic (0.5%) and ophthalmological (1.0%) disorders accounted for the lowest percentage of disorders across studies. Commercial newborn screening programs had a greater number of genes associated with oncological disorders (median 9 vs. 2 for research studies) and ENT/dental disorders (median 12 vs. 3 for research studies) than research studies. On the other hand, commercial programs had fewer genes related to hematological (median 8 vs. 44 for research studies), immunological (median 26 vs. 35 for research studies), and neurological diseases (median 18 vs. 36 for research studies). ![Figure 1.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/04/05/2024.03.24.24304797/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2024/04/05/2024.03.24.24304797/F1) Figure 1. Clinical areas represented on gene lists from 26 research and commercial genomic newborn screening programs. Genes with no disease association on OMIM or ClinGen have been excluded from this figure. ### Gene list concordance and discordance Many of the gene lists included in this analysis contain unique genes that are not shared among other studies (Figure 2A). For example, the lists of the four research studies with the largest intended sample size (BabyDetect, the Generation Study, the GUARDIAN study and NewbornsInSA) share only 98 genes (12%) out of a total of 807 genes. Across these four programs, 322 (40%) genes were included in just one of the four studies. However, among the most recently curated gene lists, from BabyScreen+, the Generation study and Nurture Genomics, a larger proportion of 238 genes (28%) is shared (Figure S5C). Commercial panels from Fulgent, PerkinElmer and Sema4 have substantially overlapping content, with almost the entire gene list of Sema4 incorporated in the larger gene lists of Fulgent and PerkinElmer (Figure S5D). ![Figure 2.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/04/05/2024.03.24.24304797/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2024/04/05/2024.03.24.24304797/F2) Figure 2. Concordance of gene lists across research and commercial genomic newborn screening programs. A. UpSet plot46 of gene lists of 4 largest research studies. The matrix below the bar graph represents each individual study and their intersections. B. Jaccard similarity index for all gene list pairs. To compare the content of each of the 26 gene lists with one another, we used a pairwise Jaccard Index, a calculated measure of concordance between two gene lists (Figure 2B). We demonstrated that several other pairs of gene lists used in genomic newborn screening programs have highly similar content. Genomic newborn screening programs with shorter gene lists generally have a higher proportion of genes that appear on at least 21 (>80%) of gene lists. We found that the proportion of high-concordance genes on a gene list is negatively correlated with the number of genes included (Pearson correlation coefficient: -0.84) (Figure S4A). Overall, most genes screened for in various genomic newborn screening programs have little concordance. Of the 1,750 genes included in at least one list, the large majority appeared on only a handful of gene lists. 1,500 (86%) genes appear on 10 lists or fewer, and 1,230 (70%) genes appear on five lists or fewer. A large proportion of genes are included only in three or fewer lists: 610 (35%) out of 1,750 genes were represented on only one gene list, 240 (14%) on two gene lists, and 172 (9%) on three gene lists (Figure S3). Importantly, there is concordance on a small number of genes: 17 genes (1% of 1,750) are included in all lists. In total, 74 (4% of 1,750) genes appear on at least 21 or >80% of gene lists. Of these genes, 58 are associated with diseases on the RUSP. Although the RUSP has been constructed with the American health care setting in mind, European, Asian and Australian sequencing programs screened at a similar rate for RUSP diseases (Table S3). Among genes not associated with disorders that are already on the RUSP, 16 appeared on 21 or more (>80%) lists, and 31 genes appeared on 19 or more lists (Figure 3). ![Figure 3.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/04/05/2024.03.24.24304797/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2024/04/05/2024.03.24.24304797/F3) Figure 3. High concordance genes that are not associated with genetic disorders that are already included on the US RUSP. The x-axis is each genomic newborn screening study and y-axis are individual genes; the corresponding cell is colored if the gene is included on a given list. ### Disease factors associated with inclusion in multiple gene lists We investigated the factors related to each gene and its associated condition that are most frequently associated with inclusion on multiple lists of genes. Genes associated with RUSP core conditions were 70.2% (se=2.5%) more likely to be included in the list of genes than genes not on the RUSP. Genes associated with secondary conditions on the RUSP were 55.1% (se=2.8%) more likely to be included (Figure 4A). ![](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/04/05/2024.03.24.24304797/F4/graphic-5.medium.gif) [](http://medrxiv.org/content/early/2024/04/05/2024.03.24.24304797/F4/graphic-5) ![](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/04/05/2024.03.24.24304797/F4/graphic-6.medium.gif) [](http://medrxiv.org/content/early/2024/04/05/2024.03.24.24304797/F4/graphic-6) Figure 4. Factors associated with gene list inclusion. A. Regression coefficients associated with various gene and disease characteristics predicting inclusion across gene lists. B. Heat map with regression coefficients associated with gene and disease characteristics for each individual genomic newborn screening program. C. Variable importance in random forest prediction model of gene and disease characteristics predicting inclusion across gene lists. Many other characteristics were correlated with gene list inclusion. The proportion of rare disease experts recommending screening for a gene on a previously administered survey4 was found to be strongly correlated with gene list inclusion (Pearson correlation coefficient = 0.66) (Figure S8). Similarly, the ASQM Score17,19,37 for a gene-disease pair was positively correlated with gene list inclusion (Pearson correlation coefficient = 0.47). There was a strong association between gene list inclusion and previously defined category BabySeq Category A17 (median of 23.9% inclusion vs. median of 10.8% inclusion for genes in Category B), which is designated as the category of genes most amenable to newborn screening (Figure S7). Linear regressions combine data from different datasets to assess the factors that could explain why some genes were more frequently included in genomic newborn screening than others (Figure 4A, Table S4). The gene and disease-related factor that was most strongly associated with gene list inclusion is the evidence base for the gene-disease pair, with conditions previously defined as having the highest evidence base score17,19,37 being 31.4% (se=2.5%) more likely to be included in lists of genes. Other factors associated with inclusion across gene lists include high treatment efficacy (16.4%, se=2.4%), high disease penetrance (16.0%, se=2.9%), high disease severity (15.2%, se=3.2%), high acceptability of treatment in terms of the burdens and risked placed on the individual19,37 (14.9%, se= 2.4%), a neonatal- or infantile-onset (14.3%, se=2.3%) of the condition, and the existence of an orthogonal test (13.1%, se=2.3%). Additionally, genes that were recommended for inclusion in newborn screening by ≥80% rare disease experts in a recent survey,4 were 44.0% (se=3.1%) more likely to be included than genes that were recommended by fewer experts. To evaluate whether different gene characteristics may drive the inclusion of a gene in the lists of gene queried by each research program and commercial product, regressions for each genomic newborn screening program were reported separately (Figure 4B and Tables S5, S7-S9). The evidence base of a gene-disease pair was highly predictive of gene inclusion for studies such as BabySeq2 and FORESITE 360 (73.9%, se=5.2% and 53.9%, se=5.0%, respectively), compared with an average of 31.4% (se=2.5%). Other programs such as BabyScreen+ put more weight on early-onset conditions (35.4%, se=5.0% vs. average of 14.3%, se=2.4%). While the coefficients on all these characteristics vary between programs (Figure 4B), the vast majority of the associations are positive, suggesting that all programs mostly value genes with each of these characteristics for genomic newborn screening, yet weigh these characteristics differently when curating gene lists. ### Random forest prediction model Because the information related to gene and disease-related characteristics such as disease prevalence, penetrance, and treatment type may change over time, we used a machine learning algorithm to determine how these and other factors predict inclusion across gene lists (Figure S9). The relative importance of all variables for prediction purposes is shown in the random forest prediction, a proxy for how much variation is explained by each variable (Figure 4C). The evidence base, recommendation proportion and the treatment acceptability are the three most important predictors of list of genes inclusion. Our prediction model was able to explain up to 79% of the variation in gene list inclusion with the characteristics in our dataset (out-of-sample R2 of the linear regression = 0.74, random forest algorithm R2 = 0.81). As more information about these genes becomes available, novel gene-disease associations are discovered, and new therapeutics emerge, these metrics may provide information about whether a gene is a good candidate for genomic newborn screening. ## Discussion At least 34 programs and commercial products worldwide are exploring genomic newborn screening, but the specific genes and associated genetic conditions queried by each program vary widely. In this study, we explored heterogeneity across gene lists and determined which genes are of highest concordance across studies and commercial products. We then identified the characteristics of genes and genetic disorders that predict their inclusion across multiple research and commercial newborn screening programs. The size and content of the gene lists used for newborn screening vary among the programs we assessed, but they all share a common emphasis on some clinical areas and specific genes. Beyond the genes associated with conditions on the RUSP, the programs in this study each include a large proportion of genes associated with inherited metabolic, immunologic, and endocrine disorders, many of which are early-onset and treatable. More specifically, *OTC,* associated with ornithine transcarbamylase deficiency, had 96% concordance across lists and the genes associated with glycogen storage disease types Ia and Ib had 96% and 92% concordance respectively, similar to findings from a survey of rare disease experts.4 These genes lack biomarkers that can be accurately assayed on a population scale and are therefore obvious candidates for ascertainment by genetic testing. Several genes associated with primary immunodeficiencies, including *CYBB*, associated with chronic granulomatous disease and *BTK*, associated with X-linked agammaglobulinemia, also had high concordance. The ascertainment of these disorders using genomic sequencing is a natural extension from the identification of severe combined immunodeficiency using T cell receptor excision circles.39 Importantly, genes associated with secondary conditions on the RUSP, such as those associated with 3-methyl-crotonyl-CoA carboxylase deficiency, were widely included across lists. Many such conditions do not conform to the historic Wilson-Jungner criteria, suggesting that some newborn sequencing programs have upheld the status quo in testing even when the disorders are not early-onset or highly treatable. Several factors may contribute to the variability of the gene lists. First, treatment accessibility and reimbursement criteria may differ from country to country. For example, several drugs that aim to produce exon-skipping in disease genes are available in the US but not in the European Union. Accordingly, the definition of which disorders are actionable or treatable may differ. Conditions with high concordance across lists, such as metabolic disorders in which dietary changes can prevent catastrophic hypoglycemia or metabolite intoxication, seem ripe for inclusion in newborn screening now. Yet, the benefit of including other disorders is not as clear, as reflected by inconsistent inclusion across gene lists. The causative genes related to tuberous sclerosis (*TSC1* and *TSC2*), for example, appear on 14 of 26 lists. For this condition, early anti-epileptic treatment can mitigate the devastating effects of West syndrome, but only in the minority of patients who present with an abnormal electroencephalogram.40 Finally, cultural differences may play a role in the breadth of the gene lists, as some treatments are not available in every country or some disorders may not be perceived as sufficiently actionable by every study team. Interestingly, in this study, the median length of gene lists from US-based research programs was 263, compared to 405 from research studies based in Europe. Most research and commercial genomic newborn screening programs report that they sought to include genes associated with disorders that are childhood-onset and treatable. In this study, we determined whether other characteristics of genes and diseases predicted their inclusion within individual genomic newborn screening programs and across lists of genes from multiple programs. To do so, we used data from prior studies to classify each gene and its associated disorders more deeply across several axes. Unsurprisingly, the linear regression model demonstrated that the most important factor for inclusion was whether or not a gene is associated with a disorder that is already included in the RUSP. Commercial programs in particular included predominantly genes associated with disorders on the RUSP and were therefore highly concordant with one another. Genomic sequencing would be an important adjunct to biochemical screening of these disorders, as the two methodologies have been demonstrated to be complementary.41,42 Unexpectedly, 42 genes without a disease association on OMIM or ClinGen were included on several lists, demonstrating that some programs were willing to include candidate genes or those with new associations with disease. Similar to prior studies,4 genes associated with relatively common and treatable hematologic disorders, such as G6PD deficiency and hemophilia type B were also found to be included across the majority of lists. Notably, however, *F8*, the gene associated with hemophilia A, which shares similar genetic and clinical characteristics to hemophilia B, was included only in a minority of lists. This discrepancy perhaps points to the technical challenge of identifying the two most common variants in *F8*, which are inversions, and are difficult to detect using genomic sequencing.43 Of note, in contrast to the results from prior studies, genes associated with pediatric hereditary cancer predisposition syndromes, such as *RB1*, were not highly concordant across gene lists. In both the linear regression model and machine learning prediction model, gene and disease characteristics found to be of high importance were the evidence base, disease penetrance; the acceptability of treatment; and the presence of an orthogonal diagnostic test. These metrics are likely to change for individual genes over time as new clinical information and therapies emerge. The penetrance of each disorder, in particular, may not be well-understood until population-wide genomic screening becomes routine.44,45 Unexpectedly, despite the stated intent of many newborn screening programs, age of onset of a disorder and disease severity were not strong predictors of inclusion across lists, perhaps because these are in fact subjective designations. This study has several limitations. First, due to the rare nature of many genetic disorders, often there is imperfect knowledge about disease characteristics such as penetrance, age of onset, and available therapies. With limited evidence, individuals curating the gene lists might make disparate selections even when intending to apply the same selection criteria. Additionally, these programs are based in different countries where the population frequencies of some genetic disorders may vary. International health care systems may also offer different specialists or disease treatments, which may influence gene inclusion. The international interest in genomic newborn screening has prompted urgent questions about which genes and disorders should be evaluated in infants. In total, 74 genes (of which 16 are not related to RUSP disorders) appeared in 21 of 26 lists. It seems reasonable that these genes should be prioritized for population-wide implementation. In the future, knowledge about a genetic condition, its available treatment, and presumed penetrance are important characteristics to consider when identifying its suitability for screening. Rather than design a static list of genes for population-wide implementation, the predictors generated by this model might be best used to shape an updated form of the Wilson-Jungner criteria suited for genomic newborn screening. These criteria could be applied to a centralized database of genes that is routinely updated to prioritize new genes for screening. ## Supporting information Supplemental information [[supplements/304797_file02.pdf]](pending:yes) ## Data Availability All data produced in the present study are available upon reasonable request to the authors. ## Declaration of interests L.M.A., A.J.C. and R.J.T. are employees and shareholders at Illumina Inc. N.G. is co-founder and equity owner of Datavisyn. N.B.G. provides occasional consulting services to RCG Consulting and receives honoraria from Ambry Genetics. R.C.G. has received compensation for advising the following companies: Allelica, Atria, Fabric, Genomic Life and Juniper Genomics; and is co-founder of Genome Medical and Nurture Genomics. B.E.R. and K.L.S. are consultants at Nurture Genomics. L.S. received personal compensation from Zentech and Illumina Inc. P.T. is a co-founder of PlumCare RWE, LLC. ## Author contributions Conceptualization: L.M.A., D.B., F.B., A.J.C., N.E., N.B.G., R.C.G., T.M., B.E.R., K.L.S., L.S., R.J.T. Data curation: S.B., A.J.C., T.M., K.L.S. Formal analysis: N.B.G., T.M. Funding acquisition: R.C.G, L.S., P.T. Investigation: S.A., S.B., N.B.G., T.M. Methodology: N.G., N.B.G., R.C.G., T.M. Project administration: T.M. Software: T.M. Supervision: N.B.G., R.C.G., L.S. Visualization: N.G., N.B.G., T.M. Writing-original draft: S.A., N.B.G., T.M. Writing-review & editing: all authors. ## Web resources BabyDetect, [https://babydetect.com/en/](https://babydetect.com/en/) BabyScreen+, [https://babyscreen.mcri.edu.au/](https://babyscreen.mcri.edu.au/) BabySeq, [https://www.genomes2people.org/research/babyseq/](https://www.genomes2people.org/research/babyseq/) BeginNGS, [https://radygenomics.org/begin-ngs-newborn-sequencing/](https://radygenomics.org/begin-ngs-newborn-sequencing/) EarlyCheck, [https://earlycheck.org/](https://earlycheck.org/) FirstSteps, [https://www.firststeps-ngs.gr/](https://www.firststeps-ngs.gr/) FORESITE 360, [https://foresite360.com/](https://foresite360.com/) Fulgent Genetics, [https://www.fulgentgenetics.com/](https://www.fulgentgenetics.com/) Genomics England Newborn Genomes Programme, [https://www.genomicsengland.co.uk/initiatives/newborns](https://www.genomicsengland.co.uk/initiatives/newborns) GUARDIAN Study, [https://guardian-study.org/](https://guardian-study.org/) International Consortium on Newborn Sequencing (ICoNS), [https://www.iconseq.org/](https://www.iconseq.org/) Igenomix, [https://www.igenomix.eu/](https://www.igenomix.eu/) Mendelics, [https://mendelics.com.br/](https://mendelics.com.br/) NEW_LIVES, [https://www.klinikum.uni-heidelberg.de/en/new-lives-genomic-newborn-screening-programs](https://www.klinikum.uni-heidelberg.de/en/new-lives-genomic-newborn-screening-programs) NewbornsInSA, [https://www.wch.sa.gov.au/research/other-research-projects/newbornsinsa-research-study](https://www.wch.sa.gov.au/research/other-research-projects/newbornsinsa-research-study) Nurture Genomics, [https://nurturegenomics.com/](https://nurturegenomics.com/) Screen4Care, [https://screen4care.eu/](https://screen4care.eu/) ScreenPlus, [https://www.einsteinmed.edu/research/screenplus/](https://www.einsteinmed.edu/research/screenplus/) PerkinElmer/Revvity, [https://www.revvity.com/be-en/category/newborn-screening](https://www.revvity.com/be-en/category/newborn-screening) ## ICoNS Gene List Subcommittee Programs: BabyDetect, BabyScreen+, BabySeq2, BeginNGS, Chen et al. 2023, Early Check, FirstSteps, the Generation study, gnSTAR, GUARDIAN study, Jian et al. 2022, Lee et al. 2019, Luo et al. 2020, NeoExome, NeoSeq, NESTS, NewbornsInSA, Progetto Genoma Puglia, Wang et al. 2023, FORESITE 360, Fulgent, Igenomix, Nurture Genomics, PerkinElmer, Sema4. Individuals: ![Table2](http://medrxiv.org/) [Table2](http://medrxiv.org/content/early/2024/04/05/2024.03.24.24304797/T2) ## Acknowledgements This work was supported by the following grants: T32GM007748 (S.B.), R01HG011773 (N.G.), K08HG012811-01 (N.B.G.), TR003201 (N.B.G.), HD077671 (R.C.G.) and TR003201 (R.C.G.). ## Footnotes * The supplemental information was updated, as one page was missing from the previous version. * Received March 24, 2024. * Revision received April 4, 2024. * Accepted April 5, 2024. * © 2024, Posted by Cold Spring Harbor Laboratory The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission. ## References 1. 1.Stark, Z., and Scott, R.H. (2023). Genomic newborn screening for rare diseases. Nat. Rev. Genet. 24, 755–766. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41576-023-00621-w&link_type=DOI) 2. 2.Joseph, G., Chen, F., Harris-Wai, J., Puck, J.M., Young, C., and Koenig, B.A. (2016). Parental Views on Expanded Newborn Screening Using Whole-Genome Sequencing. Pediatrics 137 Suppl 1, S36–S46. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1542/peds.2015-3731H&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26729702&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F04%2F05%2F2024.03.24.24304797.atom) 3. 3.Timmins, G.T., Wynn, J., Saami, A.M., Espinal, A., and Chung, W.K. (2022). Diverse Parental Perspectives of the Social and Educational Needs for Expanding Newborn Screening through Genomic Sequencing. Public Health Genomics 1–8. 4. 4.Gold, N.B., Adelson, S.M., Shah, N., Williams, S., Bick, S.L., Zoltick, E.S., Gold, J.I., Strong, A., Ganetzky, R., Roberts, A.E., et al. (2023). Perspectives of Rare Disease Experts on Newborn Genome Sequencing. JAMA Netw Open 6, e2312231. 5. 5.Acharya, K., Ackerman, P.D., and Ross, L.F. (2005). Pediatricians’ attitudes toward expanding newborn screening. Pediatrics 116, e476–e484. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1542/peds.2005-0453&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16199673&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F04%2F05%2F2024.03.24.24304797.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000232289700053&link_type=ISI) 6. 6.Cao, M., Notini, L., Ayres, S., and Vears, D.F. (2023). Australian healthcare professionals’ perspectives on the ethical and practical issues associated with genomic newborn screening. J. Genet. Couns. 32, 376–386. 7. 7.Bombard, Y., Miller, F.A., Hayeems, R.Z., Barg, C., Cressman, C., Carroll, J.C., Wilson, B.J., Little, J., Avard, D., Painter-Main, M., et al. (2014). Public views on participating in newborn screening using genome sequencing. Eur. J. Hum. Genet. 22, 1248–1254. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24549052&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F04%2F05%2F2024.03.24.24304797.atom) 8. 8.Downie, L., Halliday, J., Lewis, S., and Amor, D.J. (2021). Principles of Genomic Newborn Screening Programs: A Systematic Review. JAMA Netw Open 4, e2114336. 9. 9.Bick, D., Ahmed, A., Deen, D., Ferlini, A., Garnier, N., Kasperaviciute, D., Leblond, M., Pichini, A., Rendon, A., Satija, A., et al. (2022). Newborn Screening by Genomic Sequencing: Opportunities and Challenges. Screening 8,. 10. 10.Horton, R., and Lucassen, A. (2023). Ethical Considerations in Research with Genomic Data. New Bioeth 29, 37–51. 11. 11.Wilson, J.M.G., and Jungner, G. (1966). The Principles and Practice of Screening for Disease. 12. 12.Bodamer, O.A., Scott, C.R., Giugliani, R., and Pompe Disease Newborn Screening Working Group (2017). Newborn Screening for Pompe Disease. Pediatrics 140, S4–S13. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1542/peds.2016-0280C&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29162673&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F04%2F05%2F2024.03.24.24304797.atom) 13. 13.Turk, B.R., Theda, C., Fatemi, A., and Moser, A.B. (2020). X-linked adrenoleukodystrophy: Pathology, pathophysiology, diagnostic testing, newborn screening and therapies. Int. J. Dev. Neurosci. 80, 52–72. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/jdn.10003&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F04%2F05%2F2024.03.24.24304797.atom) 14. 14.Bick, D., Bick, S.L., Dimmock, D.P., Fowler, T.A., Caulfield, M.J., and Scott, R.H. (2021). An online compendium of treatable genetic disorders. Am. J. Med. Genet. C Semin. Med. Genet. 187, 48–54. 15. 15.DeCristo, D.M., Milko, L.V., O’Daniel, J.M., Foreman, A.K.M., Mollison, L.F., Powell, B.C., Powell, C.M., and Berg, J.S. (2021). Actionability of commercial laboratory sequencing panels for newborn screening and the importance of transparency for parental decision-making. Genome Med. 13, 50. 16. 16.Downie, L., Bouffler, S.E., Amor, D.J., Christodoulou, J., Yeung, A., Horton, A.E., Macciocca, I., Archibald, A.D., Wall, M., Caruana, J., et al. (2024). Gene selection for genomic newborn screening: moving towards consensus? Genet. Med. 101077. 17. 17.Ceyhan-Birsoy, O., Machini, K., Lebo, M.S., Yu, T.W., Agrawal, P.B., Parad, R.B., Holm, I.A., McGuire, A., Green, R.C., Beggs, A.H., et al. (2017). A curated gene list for reporting results of newborn genomic sequencing. Genet. Med. 19, 809–818. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28079900&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F04%2F05%2F2024.03.24.24304797.atom) 18. 18.Kingsmore, S.F., Smith, L.D., Kunard, C.M., Bainbridge, M., Batalov, S., Benson, W., Blincow, E., Caylor, S., Chambers, C., Del Angel, G., et al. (2022). A genome sequencing system for universal newborn screening, diagnosis, and precision medicine for severe genetic diseases. Am. J. Hum. Genet. 109, 1605–1619. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2022.08.003&link_type=DOI) 19. 19.Milko, L.V., O’Daniel, J.M., DeCristo, D.M., Crowley, S.B., Foreman, A.K.M., Wallace, K.E., Mollison, L.F., Strande, N.T., Girnary, Z.S., Boshe, L.J., et al. (2019). An Age-Based Framework for Evaluating Genome-Scale Sequencing Results in Newborn Screening. J. Pediatr. 209, 68–76. 20. 20.Dangouloff, T., Hovhannesyan, K., Piazzon, F., Mashhadizadeh, D., Helou, L., Palmeira, L., Boemer, F., and Servais, L. (2023). Baby detect: Universal genomic newborn screening for early, treatable, and severe conditions. J. Neurol. Sci. 455,. 21. 21.Owen, M.J., Lefebvre, S., Hansen, C., Kunard, C.M., Dimmock, D.P., Smith, L.D., Scharer, G., Mardach, R., Willis, M.J., Feigenbaum, A., et al. (2022). An automated 13.5 hour system for scalable diagnosis and acute management guidance for genetic diseases. Nat. Commun. 13, 4057. 22. 22.Chen, T., Fan, C., Huang, Y., Feng, J., Zhang, Y., Miao, J., Wang, X., Li, Y., Huang, C., Jin, W., et al. (2023). Genomic Sequencing as a First-Tier Screening Test and Outcomes of Newborn Screening. JAMA Netw Open 6, e2331162. 23. 23.Bailey, D.B., Jr., Gehtland, L.M., Lewis, M.A., Peay, H., Raspa, M., Shone, S.M., Taylor, J.L., Wheeler, A.C., Cotten, M., King, N.M.P., et al. (2019). Early Check: translational science at the intersection of public health and newborn screening. BMC Pediatr. 19, 238. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12887-019-1606-4&link_type=DOI) 24. 24.Huang, X., Wu, D., Zhu, L., Wang, W., Yang, R., Yang, J., He, Q., Zhu, B., You, Y., Xiao, R., et al. (2022). Application of a next-generation sequencing (NGS) panel in newborn screening efficiently identifies inborn disorders of neonates. Orphanet J. Rare Dis. 17, 66. 25. 25.Yang, R.-L., Qian, G.-L., Wu, D.-W., Miao, J.-K., Yang, X., Wu, B.-Q., Yan, Y.-Q., Li, H.-B., Mao, X.-M., He, J., et al. (2023). A multicenter prospective study of next-generation sequencing-based newborn screening for monogenic genetic diseases in China. World J. Pediatr. 19, 663–673. 26. 26.Jian, M., Wang, X., Sui, Y., Fang, M., Feng, C., Huang, Y., Liu, C., Guo, R., Guan, Y., Gao, Y., et al. (2022). A pilot study of assessing whole genome sequencing in newborn screening in unselected children in China. Clin. Transl. Med. 12, e843. 27. 27.Lee, H., Lim, J., Shin, J.E., Eun, H.S., Park, M.S., Park, K.I., Namgung, R., and Lee, J.S. (2019). Implementation of a Targeted Next-Generation Sequencing Panel for Constitutional Newborn Screening in High-Risk Neonates. Yonsei Med. J. 60, 1061–1066. 28. 28.Luo, X., Sun, Y., Xu, F., Guo, J., Li, L., Lin, Z., Ye, J., Gu, X., and Yu, Y. (2020). A pilot study of expanded newborn screening for 573 genes related to severe inherited disorders in China: results from 1,127 newborns. Ann Transl Med 8, 1058. 29. 29.Cao, Z., He, X., Wang, D., Gu, M., Suo, F., Qiang, R., Zhang, R., Song, C., Wang, X., Zhu, B., et al. (2024). Targeted exome sequencing strategy (NeoEXOME) for Chinese newborns using a pilot study with 3423 neonates. Mol Genet Genomic Med 12, e2357. 30. 30.Wang, H., Yang, Y., Zhou, L., Wang, Y., Long, W., and Yu, B. (2021). NeoSeq: a new method of genomic sequencing for newborn screening. Orphanet J. Rare Dis. 16, 481. 31. 31.Hao, C., Guo, R., Hu, X., Qi, Z., Guo, Q., Liu, X., Liu, Y., Sun, Y., Zhang, X., Jin, F., et al. (2022). Newborn screening with targeted sequencing: a multicenter investigation and a pilot clinical study in China. J. Genet. Genomics 49, 13–19. 32. 32.Wang, X., Sun, Y., Guan, X.-W., Wang, Y.-Y., Hong, D.-Y., Zhang, Z.-L., Li, Y.-H., Yang, P.-Y., Jiang, T., and Xu, Z.-F. (2023). Newborn genetic screening is highly effective for high-risk infants: A single-centre study in China. J. Glob. Health 13, 04128. 33. 33.Balciuniene, J., Liu, R., Bean, L., Guo, F., Nallamilli, B.R.R., Guruju, N., Chen-Deutsch, X., Yousaf, R., Fura, K., Chin, E., et al. (2023). At-Risk Genomic Findings for Pediatric-Onset Disorders From Genome Sequencing vs Medically Actionable Gene Panel in Proactive Screening of Newborns and Children. JAMA Netw Open 6, e2326445. 34. 34.Seal, R.L., Braschi, B., Gray, K., Jones, T.E.M., Tweedie, S., Haim-Vilmovsky, L., and Bruford, E.A. (2023). Genenames.org: the HGNC resources in 2023. Nucleic Acids Res. 51, D1003–D1009. 35. 35.Hamosh, A., Scott, A.F., Amberger, J., Valle, D., and McKusick, V.A. (2000). Online Mendelian Inheritance in Man (OMIM). Hum. Mutat. 15, 57–61. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/(SICI)1098-1004(200001)15:1<57::AID-HUMU12>3.0.CO;2-G&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=10612823&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F04%2F05%2F2024.03.24.24304797.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000084613900011&link_type=ISI) 36. 36.Rehm, H.L., Berg, J.S., Brooks, L.D., Bustamante, C.D., Evans, J.P., Landrum, M.J., Ledbetter, D.H., Maglott, D.R., Martin, C.L., Nussbaum, R.L., et al. (2015). ClinGen--the Clinical Genome Resource. N. Engl. J. Med. 372, 2235–2242. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMsr1406261&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26014595&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F04%2F05%2F2024.03.24.24304797.atom) 37. 37.Berg, J.S., Foreman, A.K.M., O’Daniel, J.M., Booker, J.K., Boshe, L., Carey, T., Crooks, K.R., Jensen, B.C., Juengst, E.T., Lee, K., et al. (2016). A semiquantitative metric for evaluating clinical actionability of incidental or secondary findings from genome-scale sequencing. Genet. Med. 18, 467–475. 38. 38.Einav, L., Finkelstein, A., Mullainathan, S., and Obermeyer, Z. (2018). Predictive modeling of U.S. health care spending in late life. Science 360, 1462–1465. 39. 39.van der Spek, J., Groenwold, R.H.H., van der Burg, M., and van Montfrans, J.M. (2015). TREC Based Newborn Screening for Severe Combined Immunodeficiency Disease: A Systematic Review. J. Clin. Immunol. 35, 416–430. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F04%2F05%2F2024.03.24.24304797.atom) 40. 40.Husain, A.M., Foley, C.M., Legido, A., Chandler, D.A., Miles, D.K., and Grover, W.D. (2000). West syndrome in tuberous sclerosis complex. Pediatr. Neurol. 23, 233–235. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0887-8994(00)00186-7&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=11033286&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F04%2F05%2F2024.03.24.24304797.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000089960900006&link_type=ISI) 41. 41.Wojcik, M.H., Zhang, T., Ceyhan-Birsoy, O., Genetti, C.A., Lebo, M.S., Yu, T.W., Parad, R.B., Holm, I.A., Rehm, H.L., Beggs, A.H., et al. (2021). Discordant results between conventional newborn screening and genomic sequencing in the BabySeq Project. Genet. Med. 23, 1372–1375. 42. 42.Adhikari, A.N., Gallagher, R.C., Wang, Y., Currier, R.J., Amatuni, G., Bassaganyas, L., Chen, F., Kundu, K., Kvale, M., Mooney, S.D., et al. (2020). The role of exome sequencing in newborn screening for inborn errors of metabolism. Nat. Med. 26, 1392–1397. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-020-0966-5&link_type=DOI) 43. 43.Johnsen, J.M., Fletcher, S.N., Dove, A., McCracken, H., Martin, B.K., Kircher, M., Josephson, N.C., Shendure, J., Ruuska, S.E., Valentino, L.A., et al. (2022). Results of genetic analysis of 11 341 participants enrolled in the My Life, Our Future hemophilia genotyping initiative in the United States. J. Thromb. Haemost. 20, 2022–2034. 44. 44.Gold, N.B., Harrison, S.M., Rowe, J.H., Gold, J., Furutani, E., Biffi, A., Duncan, C.N., Shimamura, A., Lehmann, L.E., and Green, R.C. (2022). Low frequency of treatable pediatric disease alleles in gnomAD: An opportunity for future genomic screening of newborns. HGG Adv 3, 100059. 45. 45.Gold, J.I., Madhavan, S., Park, J., Zouk, H., Perez, E., Strong, A., Drivas, T.G., Karaa, A., Yudkoff, M., Rader, D., et al. (2023). Phenotypes of undiagnosed adults with actionable OTC and GLA variants. HGG Adv 4, 100226. 46. 46.Conway, J.R., Lex, A., and Gehlenborg, N. (2017). UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics 33, 2938–2940. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btx364&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28645171&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F04%2F05%2F2024.03.24.24304797.atom)