Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias

View ORCID ProfileElinor Curnow, Kate Tilling, Jon E Heron, Rosie P Cornish, James R Carpenter
doi: https://doi.org/10.1101/2023.06.16.23291497
Elinor Curnow
1Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
2Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Elinor Curnow
  • For correspondence: elinor.curnow{at}bristol.ac.uk
Kate Tilling
1Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
2Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jon E Heron
1Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
2Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rosie P Cornish
1Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
2Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
James R Carpenter
3Department of Medical Statistics, London School of Hygiene and Tropical Medicine, University of London, London, UK
4Medical Research Council Clinical Trials Unit at University College London, University of London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables (“auxiliary variables”). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly-chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e. it is a “collider”), its inclusion can induce bias in the MI estimator and may increase SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome are incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders.

Contribution to the field statement In multiple imputation (MI), in addition to those required for the substantive analysis, imputation models often include other variables (“auxiliary variables”). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. We examine the consequences of a poorly-chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e. it is a “collider”), its inclusion can induce bias in the MI estimator and may increase SE. We demonstrate that when the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. We recommmend a combination of data exploration and consideration of plausible casual diagrams and missingness mechanisms to examine whether potential auxiliary variables are colliders.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

The results reported herein correspond to specific aims of grant MR/V020641/1 to investigators Kate Tilling and James Carpenter from the UK Medical Research Council. Elinor Curnow, Jon Heron, Rosie Cornish, and Kate Tilling work in the Medical Research Council Integrative Epidemiology Unit at the University of Bristol which is supported by the UK Medical Research Council and the University of Bristol MC_UU_00032/02. James Carpenter is also supported by the UK Medical Research Council (grant no MC_UU_00004/04). The UK Medical Research Council and the Wellcome Trust (grant no 217065/Z/19/Z), and the University of Bristol currently provide core funding for the Avon Longitudinal Study of Parents and Children. Data collection is funded from a wide range of sources.s

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Ethical approval for the study was obtained from the Avon Longitudinal Study of Parents and Children Ethics and Law Committee and local research ethics committees. Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the Avon Longitudinal Study of Parents and Children Ethics and Law Committee at the time.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • Sources of Funding The results reported herein correspond to specific aims of grant MR/V020641/1 to investigators Kate Tilling and James Carpenter from the UK Medical Research Council. Elinor Curnow, Jon Heron, Rosie Cornish, and Kate Tilling work in the Medical Research Council Integrative Epidemiology Unit at the University of Bristol which is supported by the UK Medical Research Council and the University of Bristol MC_UU_00032/02. James Carpenter is also supported by the UK Medical Research Council (grant no MC_UU_00004/04). The UK Medical Research Council and the Wellcome Trust (grant no 217065/Z/19/Z), and the University of Bristol currently provide core funding for ALSPAC. Data collection is funded from a wide range of sources.

  • Competing Interests The authors have no relevant financial or non-financial interests to disclose.

  • Data and computing code availability Stata code to verify theoretical results, and also to generate and analyse the data as per the simulation studies is included in Supplementary Material, Section S8. Stata code to analyse the real data example is included in Supplementary Material, Section S9. The real data are not publicly available due to privacy restrictions.

Data Availability

Stata code to verify theoretical results, and also to generate and analyse the data as per the simulation studies is included in Supplementary Material, Section S8. Stata code to analyse the real data example is included in Supplementary Material, Section S9. The real data are not publicly available due to privacy restrictions.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted June 18, 2023.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias
Elinor Curnow, Kate Tilling, Jon E Heron, Rosie P Cornish, James R Carpenter
medRxiv 2023.06.16.23291497; doi: https://doi.org/10.1101/2023.06.16.23291497
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias
Elinor Curnow, Kate Tilling, Jon E Heron, Rosie P Cornish, James R Carpenter
medRxiv 2023.06.16.23291497; doi: https://doi.org/10.1101/2023.06.16.23291497

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Epidemiology
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)