Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

A reassessment of Hardy-Weinberg equilibrium filtering in large sample Genomic studies

View ORCID ProfilePhil J Greer, Anastazie Sedlakova, View ORCID ProfileMitchell Ellison, View ORCID ProfileTalia DeFrancesco Oranburg, View ORCID ProfileMartin Maiers, C Whitcomb David, View ORCID ProfileBen Busby
doi: https://doi.org/10.1101/2024.02.07.24301951
Phil J Greer
1Ariel Precision Medicine, Pittsburgh, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Phil J Greer
Anastazie Sedlakova
2DNAnexus, Mountain View, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mitchell Ellison
1Ariel Precision Medicine, Pittsburgh, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mitchell Ellison
Talia DeFrancesco Oranburg
1Ariel Precision Medicine, Pittsburgh, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Talia DeFrancesco Oranburg
Martin Maiers
3NMDP, Minneapolis, MN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Martin Maiers
C Whitcomb David
1Ariel Precision Medicine, Pittsburgh, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ben Busby
2DNAnexus, Mountain View, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ben Busby
  • For correspondence: ben.busby{at}gmail.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

ABSTRACT

Hardy Weinberg Equilibrium (HWE) is a fundamental principle of population genetics. Adherence to HWE, using a p-value filter, is used as a quality control measure to remove potential genotyping errors prior to certain analyses. Larger sample sizes increase power to differentiate smaller effect sizes, but will also affect methods of quality control. Here, we test the effects of current methods of HWE QC filtering on varying sample sizes up to 486,178 subjects for imputed and Whole Exome Sequencing (WES) genotypes using data from the UK Biobank and propose potential alternative filtering methods.

METHODS Simulations were performed on imputed genotype data using chromosome 1. WES GWAS (Genome Wide Association Study) was performed using PLINK2.

RESULTS Our simulations on the imputed data from Chromosome 1 show a progressive increase in the number of SNPs eliminated from analysis as sample sizes increase. As the HWE p-value filter remains constant at p<1e-15, the number of SNPs removed increases from 1.66% at n=10,000 to 18.86% at n=486,178 in a multi-ancestry cohort and from 0.002% at n=10,000 to 0.334% at n=300,000 in a European ancestry cohort. Greater reductions are shown in WES analysis with a 11.91% reduction in analyzed SNPs in a European ancestry cohort n=362,192, and a 32.70% reduction in SNPs in a multi-ancestry dataset n=463,605. Using a sample size specific HWE p-value cutoff removes ∼ 2.25% of SNPs in the all ancestry cohort across all sample sizes, but does not currently scale beyond 300,000 samples. A hard cutoff of +/- 20% deviation from HWE produces the most consistent results and scales across all sample sizes but requires additional user steps.

CONCLUSION Testing for deviance from HWE may still be an important quality control step in GWAS studies, however we demonstrate here that using an HWE p-value threshold that is acceptable for smaller sample sizes will be inappropriate for large sample studies due to an unnecessarily high number of variants removed prior to analysis. Rather than exclude variants that fail HWE prior to analysis it may be better to include all variants in the analysis and examine their deviation from HWE afterward. We believe that adjusting the cutoffs will be even more important for large whole genome sequencing results and more diverse population studies.

KEY TAKEAWAYS

  • Current thresholds for assessing HWE are impractical for large sample sizes.

  • Filtering imputed datasets for HWE regardless of sample size is unnecessary and in fact detrimental if you have a diverse, mixed, or unknown ancestry cohort.

  • WES data shows more distributed deviation from HWE for all Minor Allele Frequencies (MAF).

  • We present an alternative p-value filter for HWE for large sample sizes.

  • We recommend that all genotype data (imputed, WES or WGS) should be analyzed, HWE computed, results combined, and then filtered post-hoc.

Competing Interest Statement

BB and AS are full time employees of DNAnexus, Inc. PG and DW are full time employees of Ariel Precision Medicine Inc

Funding Statement

Authors were compensated for time contributed to the study by their respective institutions. Their respective institutions also paid for any compute needed to complete experiments on the UKBRAP.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This study

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • Discussed and referenced imputation servers per reviewer comments

Data Availability

All data produced are available in supplementary material.

  • Abbreviations

    HWE
    Hardy Weinberg Equilibrium
    GWAS
    Genome Wide Association Study
    WES
    Whole Exome Sequencing
    WGS
    Whole Genome Sequencing
    MAF
    Minor Allele Frequency
    SNP
    QC
    quality control
    EO
    European Only
    GSD
    Gallstone Disease
    UKB
    UK Biobank
    UKB RAP
    UK Biobank Research Analysis Platform
    MHC
    Major Histocompatibility Complex
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
    Back to top
    PreviousNext
    Posted March 19, 2024.
    Download PDF
    Data/Code
    Email

    Thank you for your interest in spreading the word about medRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    A reassessment of Hardy-Weinberg equilibrium filtering in large sample Genomic studies
    (Your Name) has forwarded a page to you from medRxiv
    (Your Name) thought you would like to see this page from the medRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    A reassessment of Hardy-Weinberg equilibrium filtering in large sample Genomic studies
    Phil J Greer, Anastazie Sedlakova, Mitchell Ellison, Talia DeFrancesco Oranburg, Martin Maiers, C Whitcomb David, Ben Busby
    medRxiv 2024.02.07.24301951; doi: https://doi.org/10.1101/2024.02.07.24301951
    Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    A reassessment of Hardy-Weinberg equilibrium filtering in large sample Genomic studies
    Phil J Greer, Anastazie Sedlakova, Mitchell Ellison, Talia DeFrancesco Oranburg, Martin Maiers, C Whitcomb David, Ben Busby
    medRxiv 2024.02.07.24301951; doi: https://doi.org/10.1101/2024.02.07.24301951

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Genetic and Genomic Medicine
    Subject Areas
    All Articles
    • Addiction Medicine (349)
    • Allergy and Immunology (668)
    • Allergy and Immunology (668)
    • Anesthesia (181)
    • Cardiovascular Medicine (2648)
    • Dentistry and Oral Medicine (316)
    • Dermatology (223)
    • Emergency Medicine (399)
    • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
    • Epidemiology (12228)
    • Forensic Medicine (10)
    • Gastroenterology (759)
    • Genetic and Genomic Medicine (4103)
    • Geriatric Medicine (387)
    • Health Economics (680)
    • Health Informatics (2657)
    • Health Policy (1005)
    • Health Systems and Quality Improvement (985)
    • Hematology (363)
    • HIV/AIDS (851)
    • Infectious Diseases (except HIV/AIDS) (13695)
    • Intensive Care and Critical Care Medicine (797)
    • Medical Education (399)
    • Medical Ethics (109)
    • Nephrology (436)
    • Neurology (3882)
    • Nursing (209)
    • Nutrition (577)
    • Obstetrics and Gynecology (739)
    • Occupational and Environmental Health (695)
    • Oncology (2030)
    • Ophthalmology (585)
    • Orthopedics (240)
    • Otolaryngology (306)
    • Pain Medicine (250)
    • Palliative Medicine (75)
    • Pathology (473)
    • Pediatrics (1115)
    • Pharmacology and Therapeutics (466)
    • Primary Care Research (452)
    • Psychiatry and Clinical Psychology (3432)
    • Public and Global Health (6527)
    • Radiology and Imaging (1403)
    • Rehabilitation Medicine and Physical Therapy (814)
    • Respiratory Medicine (871)
    • Rheumatology (409)
    • Sexual and Reproductive Health (410)
    • Sports Medicine (342)
    • Surgery (448)
    • Toxicology (53)
    • Transplantation (185)
    • Urology (165)