Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Analyses using multiple imputation need to consider missing data in auxiliary variables

View ORCID ProfilePaul Madley-Dowd, View ORCID ProfileElinor Curnow, View ORCID ProfileRachael A. Hughes, View ORCID ProfileRosie Cornish, View ORCID ProfileKate Tilling, View ORCID ProfileJon Heron
doi: https://doi.org/10.1101/2023.12.11.23299810
Paul Madley-Dowd
1Centre for Academic Mental Health, Population Health Sciences, Bristol Medical School, University of Bristol, United Kingdom
2MRC Integrative Epidemiology Unit at the University of Bristol, United Kingdom
3Population Health Sciences, Bristol Medical School, University of Bristol, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Paul Madley-Dowd
  • For correspondence: p.madley-dowd{at}bristol.ac.uk
Elinor Curnow
2MRC Integrative Epidemiology Unit at the University of Bristol, United Kingdom
3Population Health Sciences, Bristol Medical School, University of Bristol, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Elinor Curnow
Rachael A. Hughes
2MRC Integrative Epidemiology Unit at the University of Bristol, United Kingdom
3Population Health Sciences, Bristol Medical School, University of Bristol, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Rachael A. Hughes
Rosie Cornish
2MRC Integrative Epidemiology Unit at the University of Bristol, United Kingdom
3Population Health Sciences, Bristol Medical School, University of Bristol, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Rosie Cornish
Kate Tilling
2MRC Integrative Epidemiology Unit at the University of Bristol, United Kingdom
3Population Health Sciences, Bristol Medical School, University of Bristol, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kate Tilling
Jon Heron
1Centre for Academic Mental Health, Population Health Sciences, Bristol Medical School, University of Bristol, United Kingdom
2MRC Integrative Epidemiology Unit at the University of Bristol, United Kingdom
3Population Health Sciences, Bristol Medical School, University of Bristol, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jon Heron
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Auxiliary variables are used in multiple imputation (MI) to reduce bias and increase efficiency. These variables may often themselves be incomplete. We explored how missing data in auxiliary variables influenced estimates obtained from MI. We implemented a simulation study with three different missing data mechanisms for the outcome. We then examined the impact of increasing proportions of missing data and different missingness mechanisms for the auxiliary variable on bias of an unadjusted linear regression coefficient and the fraction of missing information. We illustrate our findings with an applied example in the Avon Longitudinal Study of Parents and Children. We found that where complete records analyses were biased, increasing proportions of missing data in auxiliary variables, under any missing data mechanism, reduced the ability of MI including the auxiliary variable to mitigate this bias. Where there was no bias in the complete records analysis, inclusion of a missing not at random auxiliary variable in MI introduced bias of potentially important magnitude (up to 17% of the effect size in our simulation). Careful consideration of the quantity and nature of missing data in auxiliary variables needs to be made when selecting them for use in MI models.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

PMD, EC, RAH, RC, KT and JH work in the Medical Research Council Integrative Epidemiology Unit at the University of Bristol which is supported by the UK Medical Research Council and the University of Bristol (Grant ref: MC_UU_00032/02). EC is supported by the UK Medical Research Council (Grant ref: MR/V020641/1). RAH is supported by a Sir Henry Dale Fellowship that is jointly funded by the Wellcome Trust and the Royal Society (Grant ref: 215408/Z/19/Z). For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. The MRC and Wellcome (Grant refs: 217065/Z/19/Z; 076467/Z/05/Z) and the University of Bristol provide core support for ALSPAC. This publication is the work of the authors and PMD will serve as guarantors for the contents of this paper. A comprehensive list of grants funding is available on the ALSPAC website (http://www.bristol.ac.uk/alspac/external/documents/grant-acknowledgements.pdf); Linked education records were funded by the Wellcome Trust, the MRC and the Department for Education and Skills (Grant refs: 092731; EOR/SBU/2002/121).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Ethical approval for the applied example (project B4170, searchable on https://proposals.epi.bristol.ac.uk/) was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees - http://www.bristol.ac.uk/alspac/researchers/research-ethics/.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

The ALSPAC data used in the applied example cannot be shared publicly for ethical reasons. The study website contains details of all available data through a fully searchable data dictionary (http://www.bristol.ac.uk/alspac/researchers/our-data/). The scripts and folder structure used to run the applied example analysis and simulation study can be found online at https://github.com/pmadleydowd/Missing_auxiliary_variables. Datasets for the simulation study are found within this repository.

https://github.com/pmadleydowd/Missing_auxiliary_variables.

http://www.bristol.ac.uk/alspac/researchers/our-data/

  • Abbreviations

    ALSPAC
    Avon longitudinal study of parents and children
    CDF
    cumulative distribution function
    CRA
    complete records analysis
    DAG
    directed acyclic graph
    FCS
    fully conditional specification
    FMI
    fraction of missing information
    IQ
    intelligence quotient
    KS4
    key stage 4
    MAR
    missing at random
    MCAR
    missing completely at random
    MI
    multiple imputation
    MNAR
    missing not at random
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
    Back to top
    PreviousNext
    Posted December 11, 2023.
    Download PDF

    Supplementary Material

    Data/Code
    Email

    Thank you for your interest in spreading the word about medRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Analyses using multiple imputation need to consider missing data in auxiliary variables
    (Your Name) has forwarded a page to you from medRxiv
    (Your Name) thought you would like to see this page from the medRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Analyses using multiple imputation need to consider missing data in auxiliary variables
    Paul Madley-Dowd, Elinor Curnow, Rachael A. Hughes, Rosie Cornish, Kate Tilling, Jon Heron
    medRxiv 2023.12.11.23299810; doi: https://doi.org/10.1101/2023.12.11.23299810
    Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    Analyses using multiple imputation need to consider missing data in auxiliary variables
    Paul Madley-Dowd, Elinor Curnow, Rachael A. Hughes, Rosie Cornish, Kate Tilling, Jon Heron
    medRxiv 2023.12.11.23299810; doi: https://doi.org/10.1101/2023.12.11.23299810

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Epidemiology
    Subject Areas
    All Articles
    • Addiction Medicine (349)
    • Allergy and Immunology (668)
    • Allergy and Immunology (668)
    • Anesthesia (181)
    • Cardiovascular Medicine (2648)
    • Dentistry and Oral Medicine (316)
    • Dermatology (223)
    • Emergency Medicine (399)
    • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
    • Epidemiology (12228)
    • Forensic Medicine (10)
    • Gastroenterology (759)
    • Genetic and Genomic Medicine (4103)
    • Geriatric Medicine (387)
    • Health Economics (680)
    • Health Informatics (2657)
    • Health Policy (1005)
    • Health Systems and Quality Improvement (985)
    • Hematology (363)
    • HIV/AIDS (851)
    • Infectious Diseases (except HIV/AIDS) (13695)
    • Intensive Care and Critical Care Medicine (797)
    • Medical Education (399)
    • Medical Ethics (109)
    • Nephrology (436)
    • Neurology (3882)
    • Nursing (209)
    • Nutrition (577)
    • Obstetrics and Gynecology (739)
    • Occupational and Environmental Health (695)
    • Oncology (2030)
    • Ophthalmology (585)
    • Orthopedics (240)
    • Otolaryngology (306)
    • Pain Medicine (250)
    • Palliative Medicine (75)
    • Pathology (473)
    • Pediatrics (1115)
    • Pharmacology and Therapeutics (466)
    • Primary Care Research (452)
    • Psychiatry and Clinical Psychology (3432)
    • Public and Global Health (6527)
    • Radiology and Imaging (1403)
    • Rehabilitation Medicine and Physical Therapy (814)
    • Respiratory Medicine (871)
    • Rheumatology (409)
    • Sexual and Reproductive Health (410)
    • Sports Medicine (342)
    • Surgery (448)
    • Toxicology (53)
    • Transplantation (185)
    • Urology (165)