Abstract
To discover rare disease-gene associations, we developed a gene burden analytical framework and applied it to rare, protein-coding variants from whole genome sequencing of 35,008 cases with rare diseases and their family members recruited to the 100,000 Genomes Project (100KGP). Following in silico triaging of the results, 88 novel associations were identified including 38 with existing experimental evidence. We have published the confirmation of one of these associations, hereditary ataxia with UCHL1, and independent confirmatory evidence has recently been published for four more. We highlight a further seven compelling associations: hypertrophic cardiomyopathy with DYSF and SLC4A3 where both genes show high/specific heart expression and existing associations to skeletal dystrophies or short QT syndrome respectively; monogenic diabetes with UNC13A with a known role in the regulation of β cells and a mouse model with impaired glucose tolerance; epilepsy with KCNQ1 where a mouse model shows seizures and the existing long QT syndrome association may be linked; early onset Parkinson’s disease with RYR1 with existing links to tremor pathophysiology and a mouse model with neurological phenotypes; anterior segment ocular abnormalities associated with POMK showing expression in corneal cells and with a zebrafish model with developmental ocular abnormalities; and cystic kidney disease with COL4A3 showing high renal expression and prior evidence for a digenic or modifying role in renal disease. Confirmation of all 88 associations would lead to potential diagnoses in 456 molecularly undiagnosed cases within the 100KGP, as well as other rare disease patients worldwide, highlighting the clinical impact of a large-scale statistical approach to rare disease gene discovery.
Competing Interest Statement
The authors declare the following competing interests: D.S. and M.C. were seconded to and received salary from Genomics England, a wholly owned Department of Health and Social Care company, from 2016-2018 and 2013-2021 respectively. E.O.T. has research funding from Kamari Pharma, Pavella Therapeutics, Unilever and the Leo Foundation unrelated to this work. She is CI for a trial for Kamari Pharma and performs consultancy for Kamari Pharma, Azitra and Palvella Therapeutics (all money goes to the university). S.L.Z. has provided consultancy services to Health Lumen.
Funding Statement
The National Genomic Research Library is funded by the National Institute for Health Research and NHS England. The Wellcome Trust, Cancer Research UK and the Medical Research Council have also funded research infrastructure. PL was supported by GACR 24-10324S. This research was funded in part by Aligning Science Across Parkinson's [Grant numbers: ASAP-000478 and ASAP-000509] through the Michael J. Fox Foundation for Parkinson's Research (MJFF). K.B. and P.C. were supported by the Health Research Board and E.E. from Star MD This is part of the NIHR Barts Biomedical Research Centre (Caulfield, Jones) portfolio of research. The analysis was supported by a grant from the NIH, National Institute of Child Health and Human Development 1R01HD103805-01 and PhD funding from the UCLH NIHR Hearing Health BRC.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Health Research Authority (HRA) Research Ethics Committee (REC) East of England - Cambridge South (Ref 14/EE/1112) gave ethical approval for the 100,000 Genomes Project.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
This research was made possible through access to data in the National Genomic Research Library, which is managed by Genomics England Limited (a wholly owned company of the Department of Health and Social Care). The National Genomic Research Library holds data provided by patients and collected by the NHS as part of their care and data collected as part of their participation in research. Secure access to data is possible by becoming a member of the Genomics England Research Network (formerly GECIP), a network of approved researchers with access to the Genomics England Research Environment. This secure workspace provides a place to carry out research on de-identified datasets in the National Genomics Research Library.