Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Revisiting Bias in Odds Ratios

View ORCID ProfileIvo M Foppa, Fredrick S Dahlgren
doi: https://doi.org/10.1101/2021.02.28.21252604
Ivo M Foppa
1Battelle Memorial Institute, Atlanta, Georgia, USA
2Influenza Division, Centers for Disease Control and Prevention, 1600 Clifton Road NE, Atlanta, 30333 Georgia, USA
3Hessisches Landesprüfungs- und Untersuchungsamt im Gesundheitswesen,Abteilung I, Wolframstraße 33, 35683 Dillenburg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ivo M Foppa
  • For correspondence: ivo.foppa{at}hlpug.hessen.de
Fredrick S Dahlgren
2Influenza Division, Centers for Disease Control and Prevention, 1600 Clifton Road NE, Atlanta, 30333 Georgia, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Ratio measures of effect, such as the odds ratio (OR), are consistent, but the presumption of their unbiasedness is founded on a false premise: The equality of the expected value of a ratio and the ratio of expected values. We show that the invalidity of this assumptions is an important source of empirical bias in ratio measures of effect, which is due to properties of the expectation of ratios of count random variables. We investigate ORs (unconfounded, no effect modification), proposing a correction that leads to “almost unbiased” estimates. We also explore ORs with covariates. We find substantial bias in OR estimates for smaller sample sizes, which can be corrected by the proposed method. Bias correction is more elusive for adjusted analyses. The notion of unbiasedness of OR for the effect of interest for smaller sample sizes is challenged.

Introduction

Ratio measures of effect are widely used in epidemiology. In particular for case-control studies of etiology or intervention effectiveness, the odds ratio (OR) is of great importance (Pearce, 1993). The true OR is defined as Embedded Image where p1 and p0 represent exposure prevalences in cases and controls, respectively. If exposure prevalence remains constant over the study period and subjects are enrolled by “incidence density sampling”, the OR represents the factor by which the “exposure” multiplies the incidence rate in the unexposed (Greenland and Thomas, 1982).

The consistent maximum likelihood (ML) estimator of ϕ (Gart, 1962) is Embedded Image where x1 and x0 are exposed and unexposed cases and y1 and y0 are exposed and unexposed controls, respectively. In the following discussion we use OR to refer to the ML estimator of the true OR ϕ.

The problem

Here, we investigate bias in the OR, where bias ∊ is Embedded Image

Assuming independence of x1, x0, y1, y0, 𝔼 (OR) can be written as Embedded Image but, as neither Embedded Image nor Embedded Image are defined because of zero denominators (Griffin, 1992), the whole expression (4) remains undefined. With the expectation undefined, bias (3) cannot be determined. If, in the context of an observational study, instances where there are either no unexposed cases (x0) or exposed controls (y1) will be discarded because no OR (2) can be computed. On average, the OR will therefore be better characterized by a situation where the variables in the denominator (x0, y1) are assigned truncated Poisson distributions; truncation here refers to restriction of the sample space of x to ℤ+. The truncated

Poisson distribution, denoted by Poi*(µ), has the following form (Griffin, 1992): Embedded Image

The expected value of a random variable distributed according to a truncated Poisson is given by Embedded Image

Letting Embedded Image and Embedded Image being distributed according to a truncated Poisson distributions parametrized by µ0 and γ1 respectively, a “truncated” OR arises: Embedded Image

The expectation of ORt (7) is defined, but the expectations Embedded Image and Embedded Image do not have a closed-form expression. However, using Jensen’s inequality (Casella and Berger, 1990), we have Embedded Image

Therefore, Embedded Image indicating that the lower bound of 𝔼 (ORt) is Embedded Image. However, as Embedded Image, it might happen that that expression neutralizes any biases.

“Almost unbiased” estimators of Embedded Image and Embedded Image

An “almost unbiased” estimator (Chapman, 1952) for ratios of Poisson parameters such as Embedded Image or Embedded Image, however, does exist. Chapman showed that Embedded Image, for w0 = x0 + 1 is “almost unbiased” for Embedded Image, as long as µ0 is “not too small.” The same holds for Embedded Image, for z1 = y1 + 1 which is “almost unbiased” for Embedded Image.

The expectation of the ratio Embedded Image can be derived as follows: Embedded Image

The derivation of the expectation of the ratio Embedded Image follows (10). It is worth noting that the expectation of Embedded Image is therefore not simply Embedded Image, as one might naïvely expect with 𝔼(w0) = µ0 + 1, but a negatively biased expression that quickly converges to Embedded Image with increasing µ0. The equivalent, of course, holds for the expectation of Embedded Image Using this, Embedded Image

The expected value of OR+1 is Embedded Image

Therefore, the expected value 𝔼(OR+1) is the lower bound of 𝔼(ORt) (9). In contrast, Hauck et al. recommended to add 0.25 to each of the terms (x1, x0, y1, y0) of the ML estimator to calculate OR+0.25 (Hauck, Anderson, and Leahy III, 1982).

A simulation study

Unconfounded odds ratio

We simulated 100,000 data sets from case-control studies. The number of unexposed cases were simulated to arise according to a Poisson distribution with parameter µ0 ∈ (5, 10, 20, 50, 100, 1000) with a true incidence rate ratio ϕ = 2 and a control-case ratio (ratio of the expected number of controls to the expected number of cases) of 2, corresponding to expected sample sizes of 30, 60, 120, 300, 600 and 6000, respectively. ORs could not be computed for 834 and 6 datasets with µ0 = 5 and µ0 = 10, respectively, because of zero denominators. For µ0 = 5, corresponding to an expected sample size of 30, the average OR was 3.15, while the corrected analysis, that could make use of all datasets OR+1 was minimally biased downward, with the MSE a little less than a quarter of the one of the uncorrected ORs (Table 1).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1:

Mean and mean square error (MSE) of OR, OR* (adding 1 to the number of unexposed cases and exposed controls) and OR *2 (adding 0.25 to the number of unexposed cases and exposed controls), respectively, as a function of µ0. The true OR was ϕ = 2 and the control-to-case ratio was 2. For each value of µ0 100,000 simulations were run.

For µ0 = 5, OR+0.25 was even more strongly upwards biased than OR and for other values of µ0 only marginally superior to OR, both in terms of bias and in terms of MSE. While both bias and MSE of OR and OR+0.25 were always larger than for OR+1, the differences vanished with large µ0 (Table 1). We did not consider OR+0.25 any further.

The odds ratio adjusted by one confounder

To investigate the situation where the odds ratio is confounded by a binary covariate, which increases the risk of the outcome independently of the exposure of interest by 50% and which is moderately independently associated with the exposure of interest (confounder odds ratio of exposed vs. unexposed=1.2). We conducted logistic regression analyses, adjusting the analysis by the confounder. We analyzed the data in the native form and after applying one of three corrections:

  1. Adding one to cases unexposed to the exposure of interest and adding one to exposed controls (correction #1); Embedded Image

  2. Adding one to cases unexposed to either the exposure of interest or the confounder and adding one to controls exposed to either (correction #2); Embedded Image

  3. Adding one to cases unexposed to both the exposure of interest and the confounder and adding one to controls exposed to neither (correction #3); Embedded Image

We investigated six levels of µ00, which represents the mean number of cases unexposed to both the exposure of interest and the confounder (µ00 ∈ (5, 10, 20, 50, 100, 1000)), corresponding to expected sample sizes of 92, 184, 368, 919, 1,838 and 18,375, respectively. The assumed control-to-case ratio was 2. For each setting we conducted 100,000 simulations and calculated mean and MSE for OR, Embedded Image and Embedded Image For µ00 ∈ (5, 10) we also computed mean and MSE after excluding 72,382 and 5,762 datasets, respectively, for which the smallest stratum size was < 5. The uncorrected OR was substantially biased upwards for sample sizes under a thousand. Embedded Image was essentially unbiased (Table 2). In the restricted analysis for µ00 = 5 the bias for OR was lower than in the unrestricted analysis, but it was the only case for which Embedded Image was substantially biased, downward about the same amount as OR was biased upwards. The other corrections always led to a downward bias and were clearly inferior to Embedded Image, with consistently lower MSEs.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2:

Mean and mean square error (MSE) of OR, OR*1, OR*2 and OR*3 (see text) for different values of µ00 when one confounder is adjusted for (see text) by logistic regression analysis. The odds ratio is the exponentiated model coefficient corresponding to the exposure of interest. For µ00 ∈ (5, 10), mean and MSE were also calculated after exclusion of datasets in which the smallest stratum size in cases or controls was < 5 were discarded. The true OR was ϕ = 2 and the control-to-case ratio was 2. For each value of µ00 100,000 simulations were run.

Discussion and conclusion

We examined bias in ratio measures of effect, in particular ORs. As the expected value of an OR is undefined, the bias is not defined either. This kind of problem for ratios of Poisson random variable is well known (Griffin, 1992)—an OR is a ratio of two such ratios. However, even though the bias is not defined for ORs, we can examine the empirical properties of ORs. In fact, ORs are consistent, but more than trivially “biased” (the quotes are owed to the fact that this is not bias in the strict sense), i.e. on average off the true value, even if sample sizes are “reasonable”. This phenomenon has been largely ignored in the epidemiologic literature. Even though these empirical biases are more pronounced for small sample sizes, they are unrelated to sample size problems of large-sample statistical methods. We have shown that empirical ratio measure biases can be improved by adding 1 to the denominators. In the absence of confounders and effect modifiers that adjustment (OR+1; see equation (11)) leads to an “almost unbiased” OR estimate. We also found that the correction proposed by Gart (Gart, 1962), adding 0.25 to each count used to calculate the OR, performs poorly.

For the situation where there is one additional covariate we were able to identify a data correction procedure that works well Embedded Image. Future research is needed to better characterize the problem for more complex multivariate situations.

In summary, we examined statistical properties of the expectation of ratios of count random variables as an important source of empirical bias in ORs. This challenges the notion that ORs are, under very general assumptions, “good” estimates for the effects of interest even if sample sizes are relatively small.

Data Availability

Not applicable

Acknowledgments

The authors do not have a conflict of interest.

References

  1. ↵
    Casella, G. and R.L. Berger (1990). Statistical Inference. Duxbury advanced series. Brooks/Cole Publishing Company.
  2. ↵
    Chapman, Douglas G (1952). “On tests and estimates for the ratio of Poisson means”. In: Annals of the Institute of Statistical Mathematics 4.1, pp. 45–49.
  3. ↵
    Gart, John J (1962). “On the combination of relative risks”. In: Biometrics 18.4, pp. 601–610.
    OpenUrl
  4. ↵
    Greenland, Sander and Duncan C Thomas (1982). “On the need for the rare disease assumption in case-control studies”. In: American journal of epidemiology 116.3, pp. 547–553.
    OpenUrl
  5. ↵
    Griffin, Tralissa F (1992). “Distribution of the ratio of two poisson random variables”. MA thesis.
  6. ↵
    Hauck, Walter W, Sharon Anderson, and Francis J Leahy III. (1982). “Finite-sample properties of some old and some new estimators of a common odds ratio from multiple 2 × 2 tables”. In: Journal of the American Statistical Association 77.377, pp. 145–152.
    OpenUrl
  7. ↵
    Pearce Neil (Dec. 1993). “What Does the Odds Ratio Estimate in a Case-Control Study?” In: International Journal of Epidemiology 22.6, pp. 1189–1192.
    OpenUrl
Back to top
PreviousNext
Posted March 02, 2021.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Revisiting Bias in Odds Ratios
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Revisiting Bias in Odds Ratios
Ivo M Foppa, Fredrick S Dahlgren
medRxiv 2021.02.28.21252604; doi: https://doi.org/10.1101/2021.02.28.21252604
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Revisiting Bias in Odds Ratios
Ivo M Foppa, Fredrick S Dahlgren
medRxiv 2021.02.28.21252604; doi: https://doi.org/10.1101/2021.02.28.21252604

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Epidemiology
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)