Power calculations for detecting differences in efficacy of fecal microbiota donors

Scott W. Olesen

doi:10.1101/2020.04.16.20068361

Abstract

Fecal microbiota transplantation (FMT) is a recommended therapy for recurrent Clostridioides difficile infection and is being investigated as a potential therapy for dozens of other indications, notably inflammatory bowel disease. The immense variability in human stool, combined with anecdotal reports from FMT studies, have suggested the existence of “donor effects”, in which stool from some FMT donors is more efficacious than stool from other donors. In this study, simulated clinical trials were used to estimate the number of patients that would be required to detect donor effects under a variety of study designs. In most cases, reliable detection of donor effects required more than 100 patients treated with FMT. These results suggest that previous reports of donor effects need to be verified with results from large clinical trials and that patient biomarkers may be the most promising route to robustly identifying donor effects.

Introduction

Fecal microbiota transplantation (FMT), the transfer of stool from healthy donors into ill patients, is a recommended therapy for multiply recurrent Clostridioides difficile infection, the most common hospital-acquired infection in the United States [1]. FMT is also being investigated as a potential therapy for dozens of other indications, including inflammatory bowel disease, metabolic diseases, cancer, and central nervous disorders [1, 2]. Although FMT’s molecular mechanisms for treating C. difficile infection are not fully understood, FMT’s efficacy for treating C. difficile has motivated its experimental use in these other areas [3].

Stool donors are selected by a process of exclusion designed to maximize patient safety. Candidate donors are excluded based on blood and stool tests for known pathogens, risk factors for pathogen carriage, and personal and family history of potentially microbiome-mediated diseases [4]. However, given the enormous complexity of stool –which includes bacteria, viruses, fungi, microbe-derived molecules, and host-derived molecules– and the variability of stool from person to person [5, 6], it stands to reason that different stool could have different ability to treat disease. Anecdotes from FMT research, particularly in ulcerative colitis [7, 8], have created interest in the possibility of “donor effects”, that is, in potential variability in the efficacy of different donors’ stool [9, 10, 11, 12].

A “donor effect” refers to the possibility that stool from some donors is more efficacious than stool from other donors for treating some indication. For example, some FMT studies have tested for the possibility that one donor produced more efficacious stool than the other donors in the study [13, 7, 8]. Others have tested for the possibility that donors with greater microbiota diversity, or a differential abundance of some microbial taxon, is more efficacious [14, 15, 16, 17, 18]. Finally, at least one study has tested for the possibility that the composition of a donor’s gut microbiota is associated with outcome of a patient treated with that donor’s stool [19]. If donor effects do exist —if some donors, or some particular stool is more efficacious than other stool— they would be crucial to improving FMT as a therapy and to clarifying FMT’s molecular mechanisms [10].

Although multiple studies have tested for donor effects, these tests were all performed post hoc. In no case was a search for a donor effect part of the experimental design. It remains unclear if we can expect today’s FMT studies, mostly 20 to 40 patients in size [2], to reliably determine if donor effects exist. A key barrier to discovering donor effects, then, is a lack of statistical power methodology. Here we expand on previous theoretical models of donor effects [9, 20] and use simulations to estimate FMT studies’ statistical power to detect donor effects.

Methods

Four designs for detecting donor effects were investigated.

Contingency table model

In this model, variations of which were used in two previous theoretical studies of donor effects [9, 20], donors are assumed to be efficacious (“good”) or inefficacious (“bad”) and the distribution of donor efficacies is bimodal. Donor effects are tested for using a contingency table of patients outcomes by their associated donor. This approach was designed to model the statistical tests used to search for donor effects in previous FMT studies [13, 8, 7].

Specifically, a fraction ϕ of donors, the “efficacious” donors, have an efficacy ε+ (i.e., each patient treated with stool from an efficacious donor has a probability ε+ of a positive outcome). The remaining proportion 1 − ϕ of donors, the “inefficacious” donors, have efficacy ε− < ε+. To simplify the model, we make the assumptions that and the mean efficacy , leaving only a single parameter Δε = ε₊ − ε₋ (Figure 1).

Figure 1: Contingency table model.

a) In this model, half of donors have efficacy ε₊ and the other half ε₋, with mean efficacy . The effect size is Δε = ε₊ − ε₋. b) Donors’ efficacies are drawn from the distribution in a. c) Patient outcomes are drawn based on their associated donors’ efficacies. d) Statistical significance is assessed with a χ² test on the contingency table of patient outcomes by donor. e) Minimum effect size required to reach 80% statistical power in simulated clinical trials.

Simulated clinical trials were run to determine the relationship between the effect size Δε and the statistical power of the study. In each simulated trial, N_P patients were evenly distributed among N_D donors. Each donor’s underlying efficacy, either ε₋ or ε₊, is randomly set according to ϕ. Each patient’s dichotomous outcome is randomly determined depending on the efficacy of their donor. Heterogeneity among the donors’ efficacies was tested for using a χ² test with Yates’s correction on the N_D × 2 contingency table of patient outcomes by donor.

Donor biomarker model

In this model, donor effects are detected by searching for an association between some continuous-valued donor biomarker and the associated patients’ outcomes. This approach was designed to model the statistical tests used to search for donor effects in previous FMT studies, where the biomarker in question is typically the donor’s gut microbiota community diversity or the abundance of particular microbial taxa in the donor [21].

Specifically, each donor has a continuous-valued biomarker X, drawn from a normal distribution with standard deviation σ, which determines that donors’ efficacy according to logit⁻¹(βX). To simplify the model, we assume that donors with the mean biomarker values have efficacy . Then the model has one effect size βσ, which is the log odds ratio in donor efficacy per standard deviation increase in biomarker value. In other words, donor efficacies are log-normal distributed with shape parameter βσ. High β means that donors with different biomarkers have more distinct efficacies; high σ means that a sample of donors will have a wider range of biomarkers and efficacies.

For example, for βσ = 0, all donors have efficacy . For βσ = 1, a donor with a biomarker one standard deviation above the mean has an efficacy of logit⁻¹(1) ≈ 73%, while a donor one standard deviation below the mean has efficacy logit⁻¹(−1) ≈ 27%. For βσ ≫ 1, half the donors have 0% efficacy and the other half have 100% efficacy (i.e., the same distribution of efficacies as for the contingency table model for Δε = 1). Below the critical value , the distribution of donor efficacies is unimodal; above that value, it is bimodal.

In each simulated clinical trial, N_P patients receive FMT, each from a different donor, and their outcomes are simulated based on the randomly-sampled donor biomarkers X (Figure 2). A donor effect is detected using a Mann-Whitney test.

Figure 2: Donor biomarker model.

a) Donor biomarkers (ticks) are drawn from a normal distribution (curve) with standard deviation σ. b) Donor efficacies (points) are determined based on their biomarkers X and the log odds parameter β according to ε = logit⁻¹(βX). The effect size is βσ, the log odds ratio in donor efficacy per standard deviation of donor biomarker value. Patient outcomes (colors) are randomly determined based on their corresponding donors’ efficacies. c) A difference in donor biomarker values based on the outcomes of their corresponding patients is tested for with a Mann-Whitney test. c) Minimum effect size required to reach 80% statistical power in simulated clinical trials.

Donor microbiota model

In this model, a donor effect is detected by looking for a separation in the donor gut microbiota composition by the donors’ associated patient outcomes. This approach was designed to model the investigation performed in Jacob et al. [19].

In this model, like in the contingency table model, donors have one of 2 efficacies, ε₋ or ε₊, and patients have dichotomous outcomes. Again, we make the simplifying assumptions that and . In this model, donor effects are detected based on separation of the microbiota of donors associated with patients who had positive outcomes, compared to the microbiota of donors associated with negative outcomes. A PERMANOVA test [22] checks for whether donors’ microbiota composition are associated with those donors’ associated patient outcomes.

This model has two relevant effect sizes. The first is the same effect size as in the contingency table model, Δε. The second is more subtle: how distinguishable are efficacious and inefficacious donors, in terms of their microbiota composition? To model “strong” versus “weak” separation in microbiota composition, data from previous case-control studies of diarrhea [23] and obesity [24] were used. The gut microbiota composition of cases with diarrhea are easily distinguishable from controls (in a random forest classifier, AUC ≈ 0.98), while obesity cases only weakly separate from healthy controls (AUC ≈ 0.69) [25].

In each simulation, a donor was assigned as efficacious or inefficacious, as in the contingency table model. Then, each efficacious donor was assigned a microbiota composition drawn at random from the controls in one of the two case-control studies. Each inefficacious donor was assigned a case’s microbiota composition (Figure 3). Next, 1 patient was assigned to each donor, and each patient’s outcome was determined at random according to the associated donor’s efficacy. Finally, a donor effect was searched for with a PERMANOVA test, comparing the donors’ microbiota condition on the patients’ outcomes.

Figure 3: Donor microbiota model.

Two donor efficacies, dichotomous patient outcomes, donor effect measured by separation of microbiota composition. a) An example ordination plot of the data from a micro-biota case-control study. b) A subset of cases and controls, representing the simulated donors’ microbiota, are drawn, depending on the number of individuals in the simulated trial. c) Patient outcomes associated with each donor microbiota composition are drawn according to the efficacies ε₊ and ε₋. The effect size Δε determines the values ε₊ and ε₋ as in the contingency table model. A PERMANOVA test is run on the corresponding dissimilarity matrix of microbiota compositions to detect a donor effect. d) Minimum effect size required to reach 80% statistical power when using the contrast between cases and controls from microbiome studies as a proxy for the microbiome signature difference between efficacious and inefficacious controls.

Microbiota compositions were drawn from MicrobiomeHD [26], which processed 16S rRNA taxonomic marker gene sequences as described previously [25]. For the diarrhea study, non-Clostridioides difficile diarrhea patients were used as cases. For the obesity study, obese subjects were used as cases. The beta diversity matrix input into PERMANOVA was computed using the Bray-Curtis distance metric.

Patient biomarker model

In this model, the donor effect is not restricted to efficacy of dichotomous donor outcomes. Instead, it is assumed that there is some continuous-valued biomarker measured in patients after the FMT, such as fecal calprotectin in the case of ulcerative colitis [27, 28] or a microbiome biomarker such as microbial engraftment. Heterogeneity in donors is detected using a Kruskal-Wallis test.

Specifically, each patient’s biomarker outcome is drawn from a normal distribution with standard deviation σ_P, which is common to all patients, but centered on a mean value µ_d that is specific to the donor d used to treat that patient. The donor-associated values µ_d are themselves drawn from a hyperdistribution with standard deviation σ_D (Figure 4). The relevant effect size is the ratio σ_D/σ_P. For σ_D/σ_P /gg0, the variance in patients’ outcomes is due mostly to donor effects. For σ_D/σ_P = 0, the variance is due solely to patient factors.

Figure 4: Patient biomarker model.

Patient biomarker outcome. a) Each donor (colored ticks) has a mean patient biomarker value drawn from a hyperdistribution with standard deviation σ_D (black curve). b) Patients’ biomarkers (colored ticks) are drawn from distributions (colored curves), all with the same standard deviation σ_P, centered on the mean of their associated donor. The effect size is σ_D/σ_P. c) Donor effects are detected by a Kruskal-Wallis test on patient biomarker values. d) Minimum effect size required to reach 80% statistical power in simulated clinical trials.

Simulations

For all models, the number of patients was varied over 12, 24, 48, 96, and 196. For the contingency table and patient biomarker models, the number of donors was varied over 2, 4, 6, 8, and 12. For the donor biomarker and donor microbiota models, the number of donors was equal to the number of patients. For the contingency table and donor microbiota models, the effect size Δε was varied over [0, 1]. For the donor biomarker model, βσ was varied over [0, log 25]. For the patient biomarker model, σ_D/(σ_P + σ_D) was varied over [0, 1].

At 11 points along a grid of effect sizes in each model’s range, 1,000 simulations were conducted to compute the statistical power at the 0.05 confidence level. All calculations were performed using R (version 3.6.0) [29]. χ² tests were formed using chisq.test, Mann-Whitney tests using wilcox.test, Kruskal-Wallis tests using kruskal.test, and PERMANOVA tests using the function adonis in the vegan package (version 2.5-6) [30]. Code is available at Zenodo (DOI: 10.5281/zenodo.3755048)

Results

In the contingency table model, 80% statistical power was achieved in simulations similar to a typical FMT trial (i.e., with 24 patients), but the minimum effect size to achieve that power was Δε₊ = 0.76 (Figure 1). In other words, the difference between efficacious and inefficacious donors had to be so great that on average 88% of patients assigned to efficacious donors had positive outcomes (i.e.,), and only 12% of patients assigned to inefficacious donors had positive outcomes. As the number of patients in the simulated trials was increased, the minimum effect size to achieve 80% power decreased: for simulated trials with 192 patients receiving FMT, the minimum effect size was Δε = 0.3.

In the donor biomarker model, 80% statistical power was achieved with 24 patients and a minimum effect size of βσ = 1.5 (Figure 2). This effect size means that a donor with a biomarker one standard deviation above the mean has an efficacy of logit⁻¹(1.5) ≈ 82%, and a donor one standard deviation below the mean has an efficacy of 18%. For 192 patients, this minimum effect size declined to 0.5 (i.e., donors one standard deviation above the mean have efficacy logit⁻¹(0.5) ≈ 62%.

In the donor microbiota model, 80% statistical power was achieved with 24 patients when using the diarrhea case-control data as simulated efficacious and inefficacious microbiota, with a minimum effect size of Δε = 0.76 (i.e., efficacious donors have an efficacy ; Figure 3). For 196 patients, the minimum effect size dropped to Δε = 0.29. When using the obesity case-control data, representing a more subtle difference between efficacious and inefficacious donors’ microbiota, 80% power was achieved only with 192 patients and Δε = 0.72.

In the patient biomarker model, 80% statistical power was achieved in simulations with 24 patients with a minimum effect size effect size σ_D/σ_P = 3.4 (Figure 4). In other words, the variability in the mean patient biomarkers induced by different donors σ_D must be more than three times larger than the variability σ_P in patient biomarkers who receive FMT material from the same donor. For 192 patients, the minimum effect size declined to less than 0.8.

Discussion

In this study, using 4 different models of donor effects, the minimum effect size required to achieve to achieve reasonable statistical power for a given number of patients were estimated. In the contingency table, donor biomarker, and donor microbiota models, 80% statistical power was achievable with 24 patients only when the effect sizes were implausibly large. These results suggest that current FMT trials, which typically include 20 to 40 patients treated with FMT, would seem unlikely to discover a true donor effect when using one of these approaches, and large clinical trials will be needed to verify previous reports of donors effects. The patient biomarker model, however, was more powerful than the other approaches.