Impact of small sample sizes on the internal quality control: statistical uncertainties in the determination of root mean square deviations with respect to the sample mean (RMSD) or to a target value (RMSTD)

Christian Beier

doi:10.1101/2020.12.10.20247148

Abstract

Insufficient statistics due to small considered sample sizes can cause distinct problems in internal quality control (IQC) approaches. This issue concerns most of the currently applied IQC concepts either directly (if a root-mean-square-deviation metric is evaluated) or indirectly (if the IQC concept facilitates a standard deviation that was self-evaluated based on a very limited number (n≤30) of control measures). In clinical chemistry a famous example for the latter case is the common implementation of the Westgard Sigma Rules approach.

This study quantifies the statistical uncertainties in the determination of root mean square (total) deviations related to the sample mean (RMSD) or to a target value (RMSTD). It is clearly shown that RMS(T)D values based on small data sets with n<50 samples are accompanied by a significant statistical uncertainty that needs to be considered in adequate IQC limit definitions. Two mathematical models are derived to reliably estimate an optimal adaptation function to adjust IQC limits to short charts of control measures.

This article provides the theoretical background for the novel IQC method “Statistical Monitoring by Adaptive RMSTD Tests” (SMART) intended to monitor limited available numbers of recent control measures (usually n<20). The study also addresses a general problem in specificity of an IQC resulting from too small sample sizes during the evaluation period of the applied in-control standard deviation.

1 Introduction

Quality assurance, and particularly the internal quality control (IQC), are essential requirements to monitor the reliability of processes in industry and health care. An IQC of quantitative methods usually utilizes control sample(s), which are continuously captured. Aside from evaluations of each Single Measurement of a Control sample (SMC values), a statistical retrospective analysis (RA) of recent SMC data allows to early identify changes in accuracy and precision of the measuring system. Efficient and widely used statistical metrics are the root mean square deviation RMSD (aka standard deviation) and the root mean square total deviation RMSTD. The latter refers to a given target value of the control sample instead of the current average.

Based on a chart of consecutive SMC values, the current standard deviation s_n and the unsigned bias (mean inaccuracy) δ_n can be revealed by (y_i: measured SMC, y₀: target value, <y_n>: sample mean, n: sample size of an SMC chart). Particularly in the case of low-frequent sampling, the parameters s_n and δ_n are not strictly separable because:

The measurement inaccuracy (bias) can vary even within a single chart due to unpreventable or tolerated changes in environmental or operating conditions. These shifts contribute to the amount of the standard deviation.
The average values of arbitrary SMC charts of length n scatter themselves with a dispersion of . Thus, the amount of the standard deviation affects the obtained average value of a particular chart of limited size - leading to an arbitrary distortion of the real bias of this chart.
It is therefore suitable to solely focus on the entire analytical measuring uncertainty (aka RMSTD)

The RA has to handle a trade-off between (i) a sufficient number of SMC samples to reliably verify a statistical in-control condition and (ii) a special focus on most recent data to detect changes in measuring precision as fast as possible. Particularly in clinical chemistry, the RA has to deal with generally low-frequent SMC measures (1-2 SMC per day and control sample) as well as limited reasonable collection periods due to short reagent lifetimes or other frequent system interventions. Thus, the considered SMC data set is often very small on a statistical point of view. Some prescribed RA approaches in national guidelines demand only n=15 retrospective values [1]. This small number has consequences for the applied maximum permissible limits of statistical metrics, which are often defined for “sufficiently long” charts of SMC data (i.e., with little or no consideration of remaining statistical uncertainties in determination). The minimum amount of collected SMC data to reliably apply such limits is generally significantly underestimated. To prevent false out-of-control alerts, general limits have to be adapted to account for the increased statistical uncertainty at very limited sample sizes. We thus need to know, how an adequate RMSTD (or RMSD) limit level L(n) must be increased when it is applied to smaller and smaller sample sizes. The functional dependency of L(n) on n is the main topic of this article. In particular, the necessary adaptation function a(n) defined by L(n)=a(n)·L(∞) is investigated, where a(n) is equivalent to the relative limit L_rel(n)=L(n)/L(∞) and L_rel(n→∞)=1. The adaptation function of the RMSTD metric serves as an integral part of the novel and powerful IQC method “Statistical Monitoring by Adaptive RMSTD Tests (SMART)” [2]. This method efficiently covers all basic aspects of an IQC, and it is particularly suitable for low-frequent sampling. Alternatively, the provided results may facilitate a new type of confirmed RMSD-based short-term control rules, applicable in other multirule IQC concepts.

All limits defined in this article are unsigned values specifying a range around the target value of the control sample y₀±L_given.

The “Westgard Sigma Rules” [3-5] is currently the most prominent IQC approach in clinical chemistry. It is a multirule concept of consecutively applied simple tests. The entry rule is a range test with a cut-off limit of three times the expected standard deviation around the expected mean . Both parameters S and need to be determined during a pre-analytical evaluation period, if there are no explicitly prescribed values. Evaluation periods are commonly done using a very small considered sample size (n about 20). The resulting distinct statistical uncertainty implies a high risk of significantly underestimated amounts of S and . This problem and the impact to the Westgard Sigma Rules will be quantified and discussed in Chapter 3.4 of this article.

2 Methods

2.1 The determination of the “true” RMSTD

The abbreviation Δ will be used for simplicity to denote the (almost) exactly determined “true” RMSTD value, which can be revealed by a sufficiently large SMC data set (n>>100). The word “true” emphasizes that there is no remaining statistical uncertainty in the determination of the RMSTD, standard deviation, or mean bias.

The evaluated Δ value of a control sample also defines the smallest permissible limit of the RMSTD (based on and intended for large SMC data). Thus, the Δ value should nevertheless include all kinds of measuring uncertainties that are unpreventable or tolerable during the in-control operation state of the measuring system. Ideally, the determination of Δ is required for each particular combination of a control material and reagent lot. However, it is a common practice to accept a certain level of generalization. In special cases one prescribed general limit value might be sufficient for every use of a measuring system. The determination of Δ utilizes Eq. (2) and ideally requires extensive statistics over several long SMC charts of the same control sample

with different devices,
under different permissible operating and environmental conditions,
including several recalibrations, and
using different lots (if the target value is not lot-specific).

(The entire procedure may include separate statistics of different control samples with analyte concentrations predominantly near the medical decision limit.)

The SMC survey is best realized by sharing of peer-group data. The in-control state is defined by all considered data sets. Thus, a mechanism should be applied to identify and entirely reject suspect data sets that distinctly diverge from the group averages (e.g., an ANOVA test).

The collected overall data set of SMC values still includes outliers. To ensure a robust statistics, distinct outliers should be identified and rejected. This can approximately be done under the assumption that acceptable measures (defining the in-control state) are normal-distributed around the mean, whereas outliers are independently distributed. Thus, the elimination of outliers utilizes a recurrently updated fit of the histogram over all currently accepted SMC values by a Gaussian function. Most diverging SMC values (with respect to the mean) are sorted out as long as the quality of the Gaussian fit of the remaining data set increases.

The deviation and bias of the final Gaussian fit function directly provide the essential parameters S (overall standard deviation), Y_mean (overall mean value), and δ_mean (mean bias). Hence, the true RMSTD is revealed from S and δ_mean based on the general relation (y_i: sample value; y₀: target value; <y_n>: sample mean; n: entire/large sample size), which can be transformed to Some equations in Chapter 2.2 will utilize the following conversion rule to separate S: In this way, the standalone parameter S can finally be eliminated in relative RMSTD metrics. A discussion concerning the range of realistic values of ν is given in Chapter 3.1. For sufficiently large n, the factor f can be substituted by f=1; thus, Δ becomes .

2.2 The maximum error propagation of statistical uncertainties

An applied IQC limit L(n) for single-chart RMSTD values must be equal or higher than the RMSTD of any possible in-control data set of n considered SMC values. The maximally expected RMSTD value of a limited data set can be approximated by the upper limit of the n-dependent confidence interval (CI_Δ^up) of the RMSTD metric (at an adequate confidence level of 0.95 or higher). Here, the upper CI limit is the sum of the true value Δ and the expected statistical uncertainty u(Δ) at a given sample size n To apply reasonable limits, it is beneficial to estimate the curvature of CI_Δ^up at small sample sizes. This curvature will finally provide inference to the necessary adaptation of general IQC limits for the RMSTD metric with regard to an arbitrary (limited) sample size. In cases of very large n, the statistical uncertainty u(Δ) can be neglected, leading to L(n→∞)≥Δ_∞. The basic Eq. (4) of Δ contains two empiric variables S² and Y_mean, having an error due to their statistical uncertainties. The variance S² is preferred instead of the standard deviation, because an entirely unbiased estimator with regard to different sample sizes only exists for S².

To estimate u(Δ) by u(S²) and u(Y_mean) the maximum error propagation (first-order Taylor series expansion) is applied, utilizing the absolute amounts of the partial derivations ∂(Δ)/∂(S²) and ∂(Δ)/∂(Y_mean) According to Eqs. (4) and (5), the required partial derivations lead to Both derivations are always positive.

The sample mean of a normal-distributed variable is itself normal-distributed with a dispersion, equal to the overall standard deviation S divided by the square root of n. Hence, the mean values <y_n> of SMC charts of length n can be standardized according to In literature, CI limits are generally given for the common case that the true parameter value of the total population (e.g., Y_mean) is unknown. Thus, a CI is defined around the usually known value <y_n> of a small data set, where the true value Y_mean is expected to be covered by the CI (at a certain confidence level). Alternatively, an equivalent (i.e., inverted) CI can also be applied to the known global variable Y_mean, where the unknown sample mean <y_n> is expected to lie within the inverted interval around Y_mean accordingly. In case of the mean, the CI definition is symmetric, and the normal and inverted definitions are identical. Though, the asymmetric CI around the global variance S² (see below) differs from the inverted one around the variance s_n^2. of a sample subset. Thus, following CI definitions may differ from those typically mentioned in textbooks.

In brief, the CI and uncertainty of Y_mean are (z _0.95 : quantile of the normal distribution with a significance level of 5%) and The choice of an optimal value of z_0.95 is discussed in Chapter 3.1. The applied overall standard deviation S (revealed as described in Chapter 2.1) is a maximum estimator delimiting the general in-control state. Consequently, the expected real standard deviation of an individual SMC chart is most probably smaller because the device and measuring conditions are almost unchanged. One might argue that the student distribution t_n-1 (instead of z) is favorable in this case, which would result in much higher amplitudes at very small n. Though, the uncertainty of S is already explicitly considered in a separate term (see Eq. (7)). The substitution of z by t_n-1 would thus clearly overestimate u(Y_mean). It would further lead to extremely artificial results of Eq. (29) for n<3.

The uncertainty of the overall variance S² utilize the chi-squared distribution χ² (df: degrees of freedom; s_n²: variance of a sample subset of size n; χ²: chi-squared distribution). Thus, (at a probability of 1-α) the expected variance of an arbitrary n-sized sample series lies within the CI around the “true” variance S² defined by The uncertainty of the overall variance is consequently Only the upper limit of S² is relevant for our purpose; thus, a one-sided confidence level (1-α/2=0.95) is used.

A particular difficulty of the u(S²) term of the CI_Δ^up approach is the question, if the expected sample mean can be considered as known (i.e., equal to the overall Y_mean). The actual true mean value (or true bias) of a realistic SMC chart depends on the specific device, lot, and other measuring conditions. The variation of the true means of SMC charts at different measuring conditions could be significant. It is assumed that Y_mean (based on a peer-group survey) can not be taken as an ideal estimate for the expected mean of a particular chart. Further, u(S²) should specify the error of the pure dispersion of a single chart, where the uncertainty of the contribution |<y_n>-y₀|² is considered separately (see Eqs. (3) and (7)). Thus, Eq. (13) is intended for an unknown expectation value. This means that the unknown sample mean <y_n> is taken as the most suitable reference for the expectation value so s_n²=1/(n-1)*Σ_i→n (y_i -<y_n >)². Consequently, the number of the degrees of freedom is n reduced by one (df=n-1 in Eq. (12)). This decision might be controversial. One may argue that Y_mean relates to S; thus, it is always a proper estimate for the expectation value of any subset mean. This would lead to df=n instead. The latter case requires that the ratio χ²(n-1,0.95)/(n-1) would have to be replaced by χ²(n,0.95)/n in Eq. (13) and in all subsequent equations. The topic is more complex, which is also reflected by the fact that the uncertainty u(S²) is always higher in the case of df=n-1 than in the alternative case with df=n. Here, this issue leads to an intrinsic “inexactness” of the given mathematical model, which has been visualized by function corridors in several figures. The preferred variant given in Eq. (13) indicates the upper limit of the corridor, where the lower limit is defined by the alternative case with χ²(n,0.95)/n. However, it should again be mentioned that the overall S is a maximum estimate of the expected standard deviation of a single chart (due to the multi-chart origin of S). Nevertheless, at a CI of 95% and no considered bias (ν=0), the preferred variant shows best agreement with the MDCI function (see Chapter 3.2).

Combining Eqs. (7), (8), (11), and (14), the entire uncertainty of Δ can now be expressed as According to Eq. (6), the requested upper confidence limit of the total measuring uncertainty under poor statistical conditions is finally given by (n: number of considered SMC values, ν: ratio between δ_mean and S, f=(n-1)/n, χ² and z: quantiles of the chi-squared and normal distributions).

The given mathematical approach is intended to estimate the statistical uncertainty of an RMSTD of small series. However, the presented mathematical model becomes unreliable for any n below about 6, due to (i) the approximative concept of the degrees of freedom (esp. the correction factor f) and (ii) the fact that the underlying Taylor-series development is limited to the first degree, which is not intended for large uncertainties. Thus, a more fundamental (but also more specific) approach is presented in the next chapter to further investigate the amount of the statistical uncertainty at very small sample sizes.

2.3 The Multidimensional Confidence Interval

The following derivation neglects the mean bias to the target value of the control material. Thus, it focuses on the statistical uncertainty of the RMSD (aka empirical standard deviation) in cases of small sample sizes n. As given in Eq. (12), the squared relation between a particular empirical standard deviation s_n of a sample and the true standard deviation S₀ follows a chi-squared distribution. Here, a straight-forward derivation of the statistical uncertainty of RMSD values will be presented, which is comparable but not identical to the derivation of the chi-square function (which is outlined in relevant textbooks).

This fundamental theory extends the CI of single measures to a collective CI of a series of n measures, while keeping the overall confidence level constant (at, e.g., 95%). It will be denoted as the Multi-Dimensional Confidence Interval (MDCI). Hereafter, it is assumed that all (in-control) measures of the control sample are realizations of a normal-distributed totality, and each value had been standardized in advance. The MDCI theory is therefore dedicated to realizations of the standard Gaussian distribution with RMSD(n→∞)=1. Further, all SMC measures are treated to be uncorrelated to simplify the theory.

A standardization requires the knowledge of the true mean value and the true standard deviation of the particular SMC chart, which are unknown. Applying the approach described in Chapter 2.1, the overall parameters Y_mean and S can be revealed instead. By taking Y_mean as the mean value for standardization, the MDCI theory neglects the mean amount of the bias. Under the assumption that the actual bias of a limited SMC chart is variable and normal-distributed around Y_mean, this variable fraction would be almost indistinguishable from pure statistical imprecision on a global view (see Chapter 2.1). The estimator S also considers this fraction; thus, S₀ is overestimated by S. Fortunately, we are interested in maximum estimators to define upper limits, consequently S can be treated as an acceptable substitution for S₀. However, it has to be mentioned that the additional contribution of the variation of the bias to S₀, which is low-frequent, may lead to a noticeable short-term correlation between consecutive SMC measures.

The two-sided confidence interval z₉₅ for a single SMC measure y is given by the integration limit that covers 95% of the area under the normalized Gaussian probability density (GPD) function To define a confidence interval for a complete series of n measures, G must be extended to n dimensions. This is easily understandable by the general fact that the entire probability of a sequence of measures equals to the product of the unique probabilities of each measure. Here, scalar probabilities are just extended to probability distribution functions. Due to it does not matter if the measures occur in concert (multivariate) or sequentially (sample chart). Thus, the multidimensional representation of G equals to the known multivariate Gaussian distribution (y_CI(n): one-dimensional confidence limits; dV: finite Cartesian volume element), where each dimension represents one measure of the SMC sample series. G(y,n) is fully rotationally symmetric around zero, thus best represented in polar co-ordinates by the radius and n-1 orientation angles. Here, we are only interested in delimiting equipotential spherical hypersurfaces of G(y,n) without further angular dependency. Hence, after pre-integration over the entire angular ranges, G(y,n) can be expressed as a normalized radial function (r _CI : radial n-dimensional confidence limit; dΩ: finite spherical volume element in n dimensions). If G(r,n) is integrated over the radius r (origin at zero) of the Gaussian distribution in n dimensions, the size of the finite spherical volume element dΩ increases according to the area of the hypersurface of an n-dimensional sphere [6-8]. Thus, dΩ is the entire surface area of an n-dimensional sphere of radius r multiplied by a finite thickness dr (Γ(n/2): value of the gamma function for n over two).

The values of the gamma function are usually tabulated for small n (see also Table S1 on last page). However, this approach may require values beyond the size of common gamma tables. The following set of equations can be used to reveal any requested value of gamma (i: integer number; “(i)!”: denotes the factorial of i).

View this table:

Table S1:

Relevant metrics Γ(n/2), z(n), and L°(n) for all n≤40 and at the two-sided CI levels 95%, 97,5%, and 99%.

The combination of Eqs. (20) and (21) leads to the final expression of the volume integral over the radial Gaussian probability density function in n dimensions G(r,n) is normalized, which has been approved by both numeric and algebraic integration. To algebraically validate that G equals 1 over the entire range, the integral must be made consistent with a known indefinite integral. The best candidate is the general definition of the gamma function A suitable conformity to the integral term of Eq. (23) can be achieved by substitution with t=0.5·r² in Eq. (23). Under consideration of following relations the normalization factor C of the integral term of Eq. (23) can be revealed by where Eq. (24) has been applied with x=n/2. After rearrangement, C in Eq. (26) is in full agreement with the normalization factor of Eq. (23).

To obtain the common (two-sided) 95% confidence limit z₉₅(n) of a series of n measures, the normalized radial density function (Eq. (23)) has to be integrated from 0 to z₉₅(n) (n: sample size or dimension; r: radius of the spherical partitioning hypersurface; Γ: Gamma function). Thus, the radial integration limit r_CI=z₉₅(n) covers 95% of the hypervolume spanned by the multidimensional GPD distribution.

Obviously, z₉₅ is 1.96 at n=1. For higher n, the limit z₉₅(n) increases according to Eq. (27) and slowly converges towards the square root function of n. According to Eq. (20), the limit z₉₅(n) represents the Euclidean distance r_CI of a series of single confidence limits ensuring a constant overall significance level of 5%. The limit can therefore be utilized to obtain the maximum acceptable RMSD (with 95% confidence) of a sequence of n SMC measures. To obtain such a function of adequate upper RMSD limits L°(n) of n standardized values, z₉₅(n) can directly be implemented into Eq. (2) (considering y₀=0) leading to (° denotes the RMSD limit of an N(0,1)-distributed sample; y_i°: standardized SMC values). L°(n) goes from 1.96 (n=1) to 1 (for n→∞) as drawn in Fig. 1. To apply L°(n) to non-standardized y_i values, L°(n) has to be back-transformed using the overall standard deviation S and Y_mean, thus L_RMSD(n)=S·L°(n)+Y_mean.

Figure 1:

The shape of L° (Eq. (28)) of the MDCI theory is provided graphically (n≤30) and tabularly (n≥30).

3 Results and conclusions

3.1 The uncertainty of the RMSTD at small sample sizes

To provide a general curvature of the increase of the statistical uncertainty at smaller sample sizes, the CI_Δ^up(n) (given in Eq. (16)) is normalized by the true RMSTD value for n→∞, which is as defined at the very end of Chapter 2.1. It leads to the relative adaptation function for RMSTD limits (n: sample size; f=(n-1)/n; ν=δ_mean/S; χ² and z: 95%-quantiles of the chi-squared and normal distributions). The relative CI_Δ^up function L_rel^RMSTD depends on the ratio ν between bias and standard deviation but not on the absolute amount of S.

Although an important goal of measuring techniques should be the minimization of a bias (e.g., by advanced non-linear calibration), the actual biases of current analytic systems in clinical chemistry are often relevant. It is important to mention that the ratio ν of the overall parameters δ_mean and S is not directly comparable to the ratio of the maximum permissible values δ_max and s_max intended for single SMC charts. The IQC limits δ_max and s_max issue from a (deprecated) separate evaluation of inaccuracy and imprecision, where δ_max restricts the permissible variability of the bias of SMC charts. The individual biases (quantified by the particular average value) of single SMC charts vary according to device-specific, operational, and environmental conditions with a scatter parameter S_δ. The procedure to obtain δ_mean and S, presented in Chapter 2.1, includes SMC data from a large amount of SMC charts. In this overall statistics, the variation of the biases of individual charts is thus part of the overall standard deviation term S. The implicitly included variation of biases can be treated as normal-distributed. Pure short-term dispersion (true statistical imprecision) and the variation of biases are further sufficiently independent; thus, the overall variance is the sum of both variances S²=S²_imp +S²_δ. The remaining overall bias δ is just the difference between the overall mean value and the target value. If the target value is a nominal value, δ_mean is expected to be small in relation to S. However, the amount of δ_mean might be still significant, if the target value was revealed by a more precise reference method. In general, the ratio ν=δ_mean/S is expected to be clearly non-zero but distinctly smaller than the ratio φ=δ_max/s_max The ratios φ of almost all analytes in clinical chemistry have been revealed and compared in [2,9]. The range (0.6,2.1) covers almost all φ values, where most of the φ values are near 1.0. It is further assumed that all modern and well-established analytic techniques show ratios below 1.5. According to Eq. (30) an adequate value for ν is approx. 0.6 with an assumed maximum uncertainty of about ±0.5. Nevertheless, ν-values of up to 1.5 are considered in this article. Further, the decision to apply a one- or two-sided 95% CI for the parameter z_0.95 also depends on ν. If the amount of δ_mean is almost as high as S (or even higher), a one-sided CI is more suitable than a two-sided CI. Here, a smooth approximate transition between two-sided (ν=0, z=1.96) and one-sided (ν=1, z=1.645) statistics is applied by the sigmoidal function Because Eq. (29) has only two remaining variables, a 3D-plot is provided in Fig. 2 covering the range ν=0-1.5 and n=2-30. Keeping in mind that L_rel^RMSTD goes to 1.0 at very high n, the dependency on ν in the most relevant range 0.1<ν<1.1 to the entire shape of L_rel(n) is relatively small. The combination of ν=0.6 and z=1.7 will be used as a feasible representative choice for almost all analytes in clinical chemistry.

Figure 2:

A 3D-plot of the relative upper CI-95% limit of the CI_Δ^up approach is drawn according to Eq. (29). The function is not defined at n=1; thus, the given range of n is 2-30. The considered range of the bias-dispersion-ratio ν is 0-1.5, where the z-value has been adapted with regard to ν (see Eq. (31)). The frontal view at the right-hand side additionally marks key quantities of the L_rel^RMSTD function. The minimum value of 1.204 of the given section is reached at n=30; ν=1.5. The initial amplitudes (at fixed n=2) show a roughly parabolic shape from 1.71 (ν=0) via the maximum value 2.175 at ν=0.435 back to 1.741 (ν=1.5).

At a broad CI of 99%, which would result in a very tolerant IQC limit definition, it is strongly recommended to migrate the derivation of CI_Δ^up from maximum error propagation to the Gaussian error propagation (GEP). In this case, Eq. (29) changes to

3.2 Comparison of both mathematical theories

The more fundamental MDCI theory is comparable to the special case of the CI_Δ^up approach without consideration of an overall bias (ν=0). Thus, both theories are compared in Figs. 3, 4 and 5 utilizing the relative functions given in Eq. (28) and Eq. (29), respectively. Both functions converge towards one at infinity. The intrinsic uncertainty of the CI_Δ^up approach with regard to the concept of the degrees of freedom is indicated by a black/gray corridor. The upper black boundary utilizes the given Eq. (29), whereas the lower black boundary represents the substitution of the term χ _n²/(n-1) by χ ²/n in Eq. (29), which has been discussed in Chapter 2.2. Moreover, the CI_Δ^up approach becomes additionally uncertain at very small sample sizes n<6, mainly due to the approximate correction of the degrees of freedom by the prefactor f. Attempts have been made to introduce a more complex correction of the degrees of freedom [10]. However, the theory behind a fine-tuned adaptation of the prefactor f is beyond the scope of this article.

Figure 3:

The graph shows the relative increase of the statistical uncertainty of RMSD values with regard to the sample size n at a CI of 95%. The gray corridor and the flanking black graphs are obtained from the CI_Δ^up approach using Eq. (29) with ν=0 (no bias). The red graph represents the MDCI function (Eq. (28)). The narrower green corridor is taken for comparison. It shows the L_rel,GEP^RMSTD corridor obtained by Eq. (32) with ν=0.6 and z=1.7, applying a Gaussian error propagation at a CI of 95%. (For ν=0, as depicted by the gray corridor, the type of error propagation is irrelevant and Eq. (29) = Eq. (32).) All graphs converge to 1 at n→∞. A value for n=1 is not defined by the CI_Δ^up approach.

Figure 4:

The graphs show the MDCI function (L°, Eq. (28)) and selected curves of the CI_Δ^up approach (L_rel^RMSTD, Eq. (29) or L_rel,GEP^RMSTD, Eq. (32)) with different values of ν (and dedicated z) within the most relevant range 0≤ν≤1.5. The given functions of the CI_Δ^up approach are usually only the upper boundaries of the shape corridors (as directly obtained by the Eq. (29) / (32)). In (D) the complete L_rel,GEP^RMSTD corridor is exceptionally given for ν=0.6 (black lines). Graphs (A) and (B) are revealed considering a CI of 95%, where (C) and (D) base on a CI of 99%. In vertical direction the graphs differ by the type of the error propagation, which is the maximum error propagation L_rel^RMSTD in (A) and (C) and the Gaussian error propagation L_rel,GEP^RMSTD in (B) and (D), respectively. The color-codes indicate the same values of ν in all four graphs except the green lines, which mark the individual local maximum (if one exists) for ν>0. Frontal views of the entire 3D plots of L_rel,GEP^RMSTD are included as picture-in-picture, where the hidden scale is n=2-30 (see Fig. 2). The frontal view assigned to (A) is given in Fig. 2.

Figure 5:

Comparison of the curves of the upper limits of the CI-99%. (A) The amplitudes of the MDCI-99% function (red), the shape corridor of L_rel^RMSTD (ν=0, CI-99%) with variance-based error expansion (black/gray), and the corridor of L_rel^RMSTD (ν=0, CI-99%) with standard-deviation-based error expansion (blue) are compared. (B) The approximate first derivatives of all drawn curves in (A) are compared in addition. The slopes L_rel^RMSTD(n+1)-L_rel^RMSTD(n) or L°(n+1)-L°(n) have been taken as a simple approximation of the first derivative.

At a CI of 95%, the MDCI function is entirely covered by the shape corridor of L_rel^RMSTD with ν=0 as shown in Fig. 3. It can be further concluded that the upper black curve (representing the unmodified Eq. (29)) is in best agreement with the MDCI function under special consideration of the range 1<n<15. Any in-principle thinkable modification of Eq. (29) (e.g., a more complex prefactor f, a derivation with regard to the standard deviation instead of the variance, a substitution of z by the analog quantile t_n-1 of the Student distribution) result in significantly higher differences of the entire L_rel^RMSTD corridor to the MDCI function. The remaining discrepancy at very small n can most probably be dedicated to the approximate adaptation of the degrees of freedom. Thus, the CI_Δ^up curvature tends to be slightly less steep at n<6. It is assumed that this conclusion can be generalized to all CI_Δ^up functions (with ν>0). The question, why the relative CI_Δ^up function at ν=0 is not fully identical to the MDCI function, is addressed in a short appendix of this article.

The green corridor has been added to Fig. 3 for comparison. It marks the shape corridor of the CI_Δ^up approach (L_rel,GEP^RMSTD) utilizing the parameter set ν=0.6 and z_0.95=1.7 (which is considered as representative for analytes in clinical chemistry) in combination with the Gaussian error propagation according to Eq. (32) at a CI of 95%. Surprisingly, the agreement between the upper green boundary of L_rel^RMSTD and the red MDCI-95% function is nearly perfect. This agreement is also visualized in Fig. 4B as black and gray lines. Further, the green corridor in Fig. 3 is equivalent to the black corridor in Fig. 4D except the different CI. Applying the maximum error propagation, the MDCI-95% function is still within the corridor of L_rel^RMSTD with ν=0.6, z_0.95=1.7 as shown in [2].

The situation at a CI of 99% is less clear, because the CI_Δ^up functions increase more rapidly than the MDCI function, going from CI-95% to CI-99%. A comparison is shown in Fig. 4 and discussed below. Further information are provided in the appendix and Fig. 5. Nevertheless, the L_rel,GEP^RMSTD corridor according to Eq. (32) with ν=0.6 and z_0.99=2.4 comes again in agreement with the shape of the corresponding MDCI-99% function. The comparison is illustrated in Fig. 4D and [2].

3.3 Conclusions with regard to small sample sizes

The study quantitatively revealed that a significant statistical uncertainty in the determination of RMSTD or RMSD metrics must be considered if the sample size (n) is below about 50. This uncertainty (or error) can reach additional 100% (or even more) of the true RMS(T)D value. Two mathematical approaches were derived to quantify this uncertainty. The CI_Δ^up approach refers to the entire RMSTD (inclusive bias); however, it lacks precision below about n<6. The second approach (MDCI) represents a more fundamental theory, but it neglects a mean bias with respect to a target value (just RMSD). Both final functions - the relative CI_Δ^up function L_rel^RMSTD (Eq. (29)) and the MDCI function L° (Eq. (28)) - define n-dependent prefactors to the true values of RMSTD or RMSD. Despite differences in the final equations, the L_rel^RMSTD function without bias (ν=0) is very similar to the MDCI function if a CI of 95% is considered, as shown in Figs. 3 and 4.

At very broad CI ranges (CI=99%) the comparison of both approaches is less clear as discussed in Chapter 3.2. and the appendix. As shown in Fig. 4, the MDCI functions generally stay in good agreement with the L_rel,GEP^RMSTD (ν=0.6) corridors at both CI-99% and CI-95%, if the CI_Δ^up approach is switched from maximum to Gaussian error propagation. Nevertheless, a CI significantly above 95% is not recommended. For IQC a combination of a rather narrow CI of about 95% in addition to a small constant offset to all evaluated maximum permissible limits is favorable. This small general bonus allows a slightly extended basic tolerance for limited baseline variation. The offset might be in the order of the difference in maximum amplitudes (at n=2) between the CI-99% and CI-95% functions.

The quantification of the statistical uncertainty allows to optimally adapt general IQC limits for RMSTD results with regard to the sample size of a particular SMC chart or to the size of a retrospective monitoring window of SMC data. More precisely, both revealed functions L° and L_rel^RMSTD represent an adaptation function a(n) to consider limited sample sizes - leading to the n-dependent IQC limit L(n)=a(n)·L(∞). In [2] a simple fit function for a(n) in the range n<30 is suggested. The presented findings led to the development of a novel powerful IQC method [2] that efficiently monitors a very limited available number of recent SMC values.

Established quantitative diagnostic methods in clinical chemistry provide mean-bias-to-overall-deviation ratios in the range 0<ν<1.1. This conclusion is extensively discussed in [2]. The most populated region is expected to be ν≈0.6±0.3. Thus, the black curves (ν=0.6) in Fig. 4 indicate the representative functions for all analytes in clinical chemistry. The red (ν=0) and cyan (ν=1.1) curves mark the approximate boundaries of realistic ratios in clinical chemistry.

3.4 Risks of a short evaluation period in combination with Westgard Sigma Rules

The “Westgard Sigma Rules” is currently a very prominent IQC approach in clinical chemistry. It is a serial multirule concept starting with the 3S entry rule. This rule defines a first in-control range of μ_e±3·S_e also known as the Shewhart rule. Both parameters (mean value μ_e and standard deviation S_e) must be determined during a pre-analytical evaluation period, if the control sample is unlabeled. First, we assume normal-distributed SMC values during the analytic process and a negligible bias (the means of dedicated SMC charts are almost similar to μ_e). The 3S range thus covers 99.9% of the expected data, and the rule causes only about 0.2% false alerts. This sounds comfortable. However, a problem arises if the parameter S_e was insufficiently determined due to small amounts of evaluation data. Several laboratories only use 20 data points during evaluation. Let us define σ as the unknown real in-control standard deviation. The limits ±3·σ would provide the above mentioned false-alert rate of only 0.2%. Considering 20 data points, the risk is high that the real standard deviation has been significantly underestimated. In an unfavorable but still realistic scenario, the obtained S_e is near or below the lower limit of the two-sided 95% CI around σ. This happens on average in 1 of 40 evaluation processes. According to Eqs. (12) and (13), the CI^low is given by Applying this unwittingly underestimated standard deviation, Westgard’s 3S entry rule actually becomes a 2S rule and the rate of false alerts increases to 4%. Thus, on average, one false alert every 25 IQC tests must be expected in this situation. Further, the expectation value μ_e has also been self-determined with a distinct uncertainty: (t_n-1,0.975: 95%-quantile of the Student distribution). At n=20 the uncertainty of the revealed mean would be μ_e±0.47·S_e. The resulting distortion of the real mean further increases the problem of false alerts. It is important to note that all mentioned uncertainties are purely statistical effects. They can not be circumvented by consideration of very different operation conditions during the evaluation period. In fact, the evaluated standard deviation S_e would be even more underestimated (compared to the true σ), if the considered evaluation data not entirely reflect the full spectrum of potential operation conditions.

A pragmatic solution to ensure a sufficient evaluation of μ_e and S_e would be a share of peer-group data combined with an ANOVA analysis or the approach described in Chapter 2.1. A solely self-made parameter evaluation should ideally base on at least 40 values, which would reduce the final rate of false alerts of the 3S rule to below 1.9%. A more tolerant entry rule (e.g., ±3.5S or ±4S limits) of the Westgard Sigma Rules might also be part of a practicable solution, which would be particularly suitable for techniques with a high Sigma Metric.

Data Availability

All data sources are mentioned in the reference list.

Research Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Competing interests

The author declares no potential conflicts of interest with respect to the research, conclusions, authorship, and/or publication of this article.

Appendix

How does the CI_Δ^up approach at ν=0 (no bias) must be modified to become identically equal to the MDCI function? First, all modifications with regard to a correction of the degrees of freedom are rejected (esp. the prefactor f is neglected). Finally, the error propagation leading to CI_Δ^up has to utilize the standard deviation S instead of the variation S². Please note that a derivation considering the variation has been preferred because (in contrast to the square root of S²) the estimator S²=1/(n-1)·∑(y_i-<y>)² is unbiased.

Utilizing S, the derivative and uncertainty (according to Eq. (8) and (14)) change to Thus, Eq. (16) becomes to The resulting relative function L_rel^RMSTD finally ends up in the MDCI function if f=1 and ν=0.

Based on a CI of 95%, the MDCI function lies within the shape corridor of L_rel^RMSTD (always at ν=0) obtained by the preferred derivation via the error of the variance (see Eq. (29) and Fig. 3). The flanking functions of any L_rel^RMSTD corridor differ by the term χ _(n-1)²/(n-1) (upper curve) or χ _n²/n instead (lower curve), as discussed in Chapter 2.2. The corresponding corridor of the alternate derivation of CI_Δ^up via the error of the standard deviation (as given in Eq. (A.2) above) lies significantly below (not shown for CI-95%).

Based on a CI of 99%, the increases of both L_Δ^RMSTD corridors towards higher amplitudes are stronger than that of the MDCI function; thus, the MDCI function now appears between both corridors. This constellation is drawn in Fig. 5 for the sake of completeness. Please note that the CI_Δ^up approach is based on just a first-order Taylor series expansion. It is assumed that the approach becomes less accurate at higher amplitudes (i.e., at higher CI levels). The corridors in Fig. 5B further demonstrate that almost all kinds of CI_Δ^up derivations clearly fail at n≤4, most probably due to the simplified handling of the degrees of freedom.

Abbreviations

ANOVA: analysis of variance
CI: confidence interval
CI_Δ^up: upper limit of the confidence interval of the RMSTD metric
Δ: the “true” RMSTD value determined using sufficient amounts of data
GEP: Gaussian error propagation
GPD: Gaussian probability density
IQC: internal quality control
MDCI: multidimensional confidence interval
RA: retrospective (statistical) analysis of SMC data
RMSD: root mean square deviation (aka empirical standard deviation)
RMSTD: root mean square total (or target) deviation with respect to a known target value (aka entire analytical measuring uncertainty)
SMART: Statistical Monitoring by Adaptive RMSTD Tests
SMC: single measurement of a control sample

References

[1].↵
Revision of the “Guideline of the German Medical Association on Quality Assurance in Medical Laboratory Examinations – Rili-BAEK” (unauthorized translation), J. Lab. Med. 39 (2015) 26–69. https://doi.org/10.1515/labmed-2014-0046.
OpenUrl
[2].↵
C. Beier, The Statistical Monitoring by Adaptive RMSTD Tests: an efficient, informative, and customizable method for the complete internal quality control intended for low-frequent sampling of control measures, medRxiv (2020). https://doi.org/10.1101/2020.10.08.20209288.
[3].↵
Six sigma-based quality control, Learning guide, Abbott Diagnostics, https://www.corelaboratory.abbott/sal/learningGuide/ADD-00058819_Six_Sigma_Learning_Guide.pdf (accessed Mar 2020).
[4].
J.O. Westgard, Rules and multirules, http://www.westgard.com/westgard-rules-and-multirules.htm (accessed Mar 2020).
[5].↵
S. Westgard, H. Bayat, J.O. Westgard, Analytical sigma metrics: a review of six sigma implementation tools for medical laboratories, Biochem. Med. (Zagreb) 28 (2018) 1–12. https://doi.org/10.11613/BM.2018.020502.
OpenUrl
[6].↵
E.W. Weisstein, Hypersphere, http://mathworld.wolfram.com/Hypersphere.html (accessed Nov 2019).
[7].
D.M.Y. Sommerville, An introduction to the geometry of N dimensions, Dover Publications, New York, 1958.
[8].↵
P. Losket, N.C. Beaulieu, On monotonicity of the hypersphere volume and area. J. Geom. 87 (2007) 96–98. https://doi.org/10.1007/s00022-007-1891-1.
OpenUrl
[9].↵
C. Beier, Recommended changes of the current version of the German Rili-BAEK, J. Lab. Med. 43 (2019) 225–233. https://doi.org/10.1515/labmed-2019-0097.
OpenUrl
[10].↵
J. Gurland, R.C. Tripathi, A simple approximation for unbiased estimation of the standard deviation, Am. Stat. 25 (1971) 30–32. https://doi.org/10.1080/00031305.1971.10477279.
OpenUrl CrossRef