ABSTRACT
Background Mammograms contain information that predicts breast cancer risk. We recently discovered two novel mammogram-based breast cancer risk measures based on image brightness (Cirrocumulus) and texture (Cirrus). It is not known whether these measures improve risk prediction when fitted together, and with an established measure of mammographic density (Cumulus).
Methods We used three studies consisting of: 168 interval cases and 498 matched controls; 422 screen-detected cases and 1,197 matched controls; and 354 younger-diagnosis cases and 944 frequency-matched controls. We conducted conditional and unconditional logistic regression analyses of individually-and frequency-matched studies, respectively. We reported risk gradients as change in odds ratio per standard deviation of controls after adjusting for age and body mass index (OPERA). For models involving multiple measures, we calculated the OPERA equivalent to the area under the receiver operating characteristic curve.
Results For interval, screen-detected and younger-diagnosis cancer, the best fitting models (OPERAs [95% confidence intervals]) were: Cumulus (1.81 [1.41 to 2.31]) and Cirrus (1.7 [1.38 to 2.14]); Cirrus (1.49 [1.32 to 1.67]) and Cirrocumulus (1.16 [1.03 to 1.31]); and Cirrus (1.70 [1.48 to 1.94]) and Cirrocumulus (1.46 [1.27 to 1.68]), respectively. Their OPERA equivalents were: 2.35, 1.58, and 2.28, respectively.
Conclusions Our mammogram-based measures improved risk prediction beyond and, except for interval cancers, negated the influence of conventional mammographic density. Combined, these new mammogram-based risk measures are at least as accurate as the current polygenetic risk scores (OPERA ~ 1.6) in predicting, on a population basis, women who will be diagnosed with breast cancer.
Historically, mammographic density has been defined as the light or bright areas on a mammogram (we call this Cumulus) and has well-established risk associations with breast cancer overall and with both interval and screen-detected cancers [1]. But there has been debate about the extent to which these associations are due to existing tumours being missed at mammographic screening, especially for interval cancers. It is also not clear if this measure is the only, let alone the best, mammogram-based predictor of breast cancer.
We addressed these issues by trying to discover aspects of a mammographic image that differ between women with and without breast cancer. First, we redefined mammographic density at, in effect, a higher pixel brightness threshold to encompass just the brightest regions to create Cirrocumulus [2-5]. Second, we applied machine learning to textural patterns to create Cirrus [6].
We previously considered each new measure separately with the established measure. Both Cirrus and Cirrocumulus were correlated with Cumulus; r~0.4 and 0.6, respectively. Except for interval cancer, we found that, when fitted together, the Cumulus risk gradient substantially decreased compared with being fitted alone. On the other hand, the Cirrocumulus and Cirrus risk gradients both remained similar to what they were when fitted alone [2-6]. We concluded that “conventional mammographic density predicts interval cancer due to its role in masking, while the new mammogram-based risk measures could have a causal effect on both interval and screen-detected breast cancer” [7].
We assessed the strength of risk prediction, in terms of the ability to differentiate cases from controls on a population basis, using the odds per adjusted standard deviation (OPERA) [8]. Here we used the standard deviation of the residuals for controls after adjusting for age and body mass index, not the standard deviation of the cross-sectional distribution of the measure itself. This is because age and body mass index confound the associations of mammogram-based risk measures with breast cancer risk and need to be adjusted for. After doing this, the resulting estimated risk gradient estimate relates to the adjusted measure, not to the raw measure.
OPERA also allows risk factors to be compared and put into perspective in terms of risk discrimination in ways that are not possible using, for example, change in the area under the receiver operating characteristic curve (AUC), which depends on the order in which the factors are included.
The OPERA approach showed that, individually, our new mammogram-based risk measures are among the strongest of all currently known breast cancer risk factors [6]. But it is not known what risk prediction is obtained when these are fitted together with the conventional mammographic density measure.
In this paper we aimed to determine the extent to which our new measures are correlated with one another and the extent to which risk prediction is improved when our new measures are fitted together and with the established mammographic density measure. In doing so, we sought to find out how risk prediction obtained from combining our new measures compares with that from other breast cancer risk factors.
Methods
We used data from three independent studies: (i) a nested case-control study of 168 cases with interval breast cancer (those diagnosed within two years of a negative screen) and 498 matched controls within the Melbourne Collaborative Cohort Study (MCCS) [5, 9-11]; (ii) a nested case-control study of 422 cases with screen-detected breast cancer and 1,197 matched controls within the MCCS [5, 9-10]; and (iii) a case-control study of 354 cases with on average younger-diagnosis breast cancers and 944 controls from the Australian Breast Cancer Family Study and the Australian Mammographic Density Twins and Sisters Study, both over-sampled and frequency matched for family history [3].
For all studies, the average ages at diagnosis of the cases and the average ages at interview of the controls were similar. For studies (i) and (ii), the average times between mammogram and diagnosis were 6 and 5 years (standard deviation 3) for screen-detected and interval cancers, respectively. Mean (standard deviation) age at diagnosis was 62.3 (7.3) years for the interval cancers, 64.3 (8.2) years for the screen-detected cancers, and 48.5 (10.7) years for the younger-diagnosis cases. For study (iii), by design 30% of cases and 29% of controls had a family history of breast cancer compared with 10% of controls in studies (i) and (ii).
Mammogram-based measures
We used digitised film mammograms. The Cumulus and Cirrocumulus measures had been created using the computer-assisted threshold software CUMULUS [12]. The Cirrus measures were those created previously [6]. All measures were transformed to approximate normality, adjusted for age and body mass index, and scaled by the standard deviation of the residuals for controls. For Cirrocumulus we used the absolute measure because it has less measurement error. For Cumulus we used the percentage measure because it was the better predictor of interval cancer [10].
Statistical Methods
All analyses were conducted using the Stata software [13]. For descriptive purposes, we presented the numbers of cases and controls for each pair of measures, and the estimated risks relative to the population average for different tertile-by-tertile categories (based on controls) using the cci option. The control data was used to investigate the joint distributions of the pairs of measures and to determine the proportion of the population in different categories.
Risk gradients were estimated using conditional logistic regression for the two nested matched case-control studies and using unconditional logistic regression for the frequency-matched case-control study. To compare fits, we used the likelihood ratio criterion [14] with P<0.05 considered to be the threshold for nominal statistical significance.
The risk gradient, and hence the ability to differentiate cases from controls on a population basis, was reported as the change in OPERA for which we used adjustment for age at mammogram and body mass index [8]. We present OPERA estimates in the tables for ease of interpretation and log(OPERA) in the text because it is the natural scale on which to assess risk gradients.
Under the assumptions of a multiplicative risk model for a normally distributed risk factor, where Φ is the normal (0,1) distribution function (see Supplementary Material in [6]), so that the AUC is approximately linearly related to log(OPERA), at least in the range of AUC from 0.5 to 0.7 (OPERA from 1 to 2). Under the multiplicative risk model, log(OPERA) is equal to the difference in means between cases and controls divided by the standard deviation of the adjusted risk factor; the inter-quartile risk ratio is approximately OPERA2.5. When we fitted multiple risk measures together, we used the AUC and the formula above to calculate the corresponding (equivalent) log(OPERA) as if we had fitted one combined measure.
Results
Combining data from the three control samples, the correlation between Cirrocumulus and Cirrus was 0.3, while the correlations of percentage Cumulus with Cirrocumulus and Cirrus were both 0.5; see Supplementary Figures. The standard errors of these correlations were ~0.02.
Interval breast cancer
Table 1 shows that when Cumulus and Cirrus, or Cirrocumulus and Cirrus, were fitted together, their risk associations both attenuated but remained significant. When all three were fitted together, Cirrocumulus was not significant (P=0.6). The best-fitting model involved Cumulus and Cirrus with log OPERAs of 0.59 (95% confidence interval [CI] 0.34 to 0.84) and 0.54 (95% CI, 0.32 to 0.76), respectively. The AUC was equivalent to log(OPERA) = 0.85 (95% CI, 0.68 to 1.04).
For interval cancer, OPERA (95% CI) estimates of odds ratio per adjusted standard deviation from univariable and multivariable analyses of Cumulus (as a percentage), Cirrocumulus and Cirrus, all normalised, adjusted for age and body mass index, and standardised.
Women in the highest tertiles of both Cumulus and Cirrus are at 2.51 (95% CI, 1.72 to 3.64) times population risk (P<0.001), and this group comprised 17% of controls. At the other extreme, women in the lowest tertiles of both Cumulus and Cirrus are at 0.22 (95% CI, 0.08 to 0.49) times population risk (P<0.001), and this group comprised 19% of controls; see Supplementary Table 1. Similar findings were obtained when stratifying women by tertiles of both Cirrocumulus and Cirrus; see Table 2.
For Cirrus and Cumulus, risk relative to the population average, with 95% CI in parentheses, and numbers of cases and controls as a ratio, by tertile-by-tertile for interval cancer.
For Cirrus and Cirrocumulus, risk relative to the population average, with 95% CI in parentheses, and numbers of cases and controls as a ratio, by tertile-by-tertile.
Screen-detected breast cancer
Table 3 shows that, when Cumulus was included with Cirrus or Cirrocumulus, there was no improvement in fit and the Cumulus estimate was no longer significant. When Cirrocumulus and Cirrus were fitted together, their risk associations both attenuated but remained significant. The best fitting model involved Cirrus and Cirrocumulus with log OPERAs of 0.40 (95% CI, 0.28 to 0.51) and 0.15 (95% CI, 0.03 to 0.27), respectively. For the combined measures, the AUC was equivalent to log(OPERA) = 0.46 (95% CI, 0.34 to 0.58).
For screen-detected cancer, OPERA (95% CI) estimates of odds ratio per adjusted standard deviation from univariable and multivariable analyses of Cumulus (as a percentage), Cirrocumulus and Cirrus, all normalised, adjusted for age and body mass index, and standardised.
Women in the highest tertiles of both Cirrocumulus and Cirrus are at 2.01 (95% CI, 1.54 to 2.61) times population risk (P<0.001), and this group comprised 15% of controls. At the other extreme, women in the lowest tertiles of both Cirrocumulus and Cirrus are at 0.52 (95% CI, 0.33 to 0.78) times population risk (P<0.001), and this group comprised 14% of controls; see Table 2.
Younger-diagnosed cancer
Table 4 shows that, when fitted alone, there was very strong evidence that the model including Cirrus had the best fit, and the fit was improved when further including Cirrocumulus (P<0.001). The addition of Cumulus did not improve the fit (P=0.8). The best-fitting model involved Cirrus and Cirrocumulus with log OPERAs of 0.53 (95% CI, 0.39 to 0.66) and 0.38 (95% CI, 0.24 to 0.52), respectively. The AUC was equivalent to log(OPERA) = 0.82 (95% CI, 0.70 to 0.96).
For younger-diagnosis cancer, OPERA (95% CI) estimates of odds ratio per adjusted standard deviation from univariable and multivariable analyses of Cumulus (as a percentage), Cirrocumulus and Cirrus, all normalised, adjusted for age and body mass index, and standardised.
Women in the highest tertiles of both Cirrocumulus and Cirrus are at 2.54 (95% CI, 1.93 to 3.33) times population risk (P<0.001), and this group comprised 16% of controls. At the other extreme, women in the lowest tertiles of both Cirrocumulus and Cirrus are at 0.39 (95% CI, 0.24 to 0.62) times population risk (P<0.001), and this group comprised 17% of controls; see Table 2.
Figure 1 shows that the receiver operating curves for Cirrus and Cirrocumulus have different shapes and crossed over. For Cirrus, the sensitivity increased rapidly from zero as the specificity decreased from 1, while for Cirrocumulus, the specificity increased rapidly from zero as the sensitivity decreased from 1.
Receiver operating characteristic (ROC) curves for Cirrocumulus and Cirrus for case-control study of younger-diagnosis cancer.
Discussion
Our new mammogram-based risk measures based on brightness (Cirrocumulus) and texture (Cirrus) improved risk prediction beyond the established measure of mammographic density (Cumulus). For all three studies, the best fitting model included several measures and performed substantially better than using the established measure alone (all P<0.001). Except for interval cancers, the new measures also negated the importance of the established measure on risk prediction.
We also found that, when combined, the new mammogram-based risk measures are at least as accurate in identifying women who will be diagnosed with breast cancer as the recently published polygenic risk score, which has an OPERA of ~1.6 [15] or log(OPERA) = 0.48. For younger-diagnosis breast cancer, the AUC for the combination of our measures was 0.72, equivalent to log(OPERA) = 0.82. Therefore, in terms of differentiating women with or without breast cancer at a young age, our measures were ([0.82-0.48]/0.48)×100 = 70% more accurate than the polygenic risk score. It is plausible that inclusion of a polygenic risk score with the mammogram-based risk measures will further improve risk prediction [16].
On a population basis, therefore, the combination of these new measures appears to be the strongest of all known breast cancer risk factors. For example, when Cirrocumulus and Cirrus were combined to predict breast cancer at on average a young age (see Table 1), the OPERA equivalent was 2.28, so the interquartile risk ratio is ~7-fold. In comparison, the interquartile risk ratio is ~4-fold for a multigenerational family history risk score in predicting breast cancer before age 50 years, ~3-fold for the latest polygenic risk score, ~2-fold for conventional mammographic density, ~1.5-fold for BRCA1 and BRCA2 mutations, and ~1.2-fold or less for lifestyle-related risk factors [7,8].
From the contrasting shapes of their receiver operating characteristic curves, it can be seen that Cirrus has greater sensitivity at high specificity (correctly identifying true negatives), while Cirrocumulus has greater specificity at high sensitivity (correctly identifying true positives). Therefore, Cirrocumulus does better at identifying women at higher than average risk, while Cirrus does better at identifying women at lower than average risk.
Cirrus gave similar risk gradients in all three settings, suggesting it is tapping into new and fundamental risk-predicting aspects of a mammogram. This was also evident in the original work developing Cirrus, which found that a similar risk prediction was achieved for women of Japanese ancestry living in Hawaii and for Australian women [6]. Note that Cirrus was designed not to depend on brightness and has only a modest correlation with the brightness measures.
Our new mammogram-based measures are potentially of substantial clinical and population health significance. They not only identify groups of women at substantially increased risk, but they also identify larger groups of women at decreased risk. When categorised by tertiles, Cirrus and Cirrocumulus divide the population into two extreme groups of approximately the same size (each about 15-20%) containing women who are on average either at twice or more population risk, or at half or less population risk; see Table 2. For interval cancer, about 60% of controls were in the six categories with below population average risk, while for both screen-detected and interval cancer, about 75% of controls were in the six categories with below population average risk.
These observations are highly relevant to considerations of tailored, or personalised, screening based on risk, for which there are now several trials being conducted across the world. These include the Wisdom Study in the United States [17,18], My personalised breast screening (MyPeBS) in France (https://clinicaltrials.gov/ct2/show/NCT03672331), and PROCAS2 in the United Kingdom (https://preventbreastcancer.org.uk/breast-cancer-research/research-projects/early-detection-screening/procas/).
These risk categorisations are in stark contrast to those using BI-RADS alone. Currently, about 40% of screening women in the United States are classified as having dense breasts defined by BI-RADS categories c or d. As a result of a community-led initiative [19], in 35 states it is mandated by law that these women are informed. Research studies in which one or a few radiologists measure BI-RADS in a controlled manner suggest the increased risk associated with having dense breasts is about 1.6 to 2.2-fold (see IBIS [20] and BOADICEA [21]).
In practice, BI-RADS is measured by multiple radiologists at a given screening service, especially over time, opening the potential for substantial measurement error. For example, from the Supplemental data on 60,000 women screened at a large United States medical center [22], the odds ratio for being classified as having dense breast is only about 1.1, which is far less than found by the research studies (P<0.001). This was despite the measurements being recorded by “radiologists who specialized in breast imaging and who had 5-33 years of experience following the American College of Radiology BI-RADS lexicon” [23]. It would appear, therefore, that in practice there could be so much variation across measurers, even experienced specialists in a large city-based service, that clinical BI-RADS measurements might be providing very little information on risk stratification across the population.
There is substantial scope for better addressing the issue of dense breasts by going beyond BI-RADS. A major consequence of having dense breasts is an increased risk of interval cancer. We and others have found that, as well as conventional mammographic density (Cumulus), having a family history and other risk factors, such as cumulative exposure to ovarian hormones based on the Pike model [24], combine to predict interval cancer [11]. In our study we have found that Cirrus also brings almost as much information as Cumulus, and when combined they have an inter-quartile risk ratio for interval cancer of almost 9-fold. Future work will consider how risk of interval cancer, and even of missed cancers, can be further optimised by combining mammogram-based measures with family history, genetic risk scores and other risk factors. This could have a profound impact on the way the issue of dense breasts is addressed in the future.
For our findings to be translated into wider clinical practice, automated use of the mammogram-based and other risk measures needs to be implemented. We are developing a program to measure Cirrus automatically from batch files of mammograms and are using deep learning to develop similar automated measures of Cumulus and Cirrocumulus. We are developing the empirical evidence, such as in this and other papers [11, 16] to find out how mammogram-based risk measures combine with each other and with other important risk factors to predict risk.
In conclusion, while the established mammographic density measure improved prediction of interval cancer, most likely due to its role in masking tumours, it provided no substantive additional information on risks of screen-detected or younger-diagnosis cancer in addition to our new mammogram-based risk measures. Our findings demonstrate the potential for much improved and more aetiologically relevant breast cancer risk prediction by discovering new ways of extracting information on breast cancer risk from a mammogram. Risk-based personalised breast screening could become part of the precision medicine era [17, 18].
Data Availability
For data accessment, please contact the corresponding author.
Grant support
This research was supported by the National Health and Medical Research Council (251533, 209057, and 504711), the Victorian Health Promotion Foundation, Cancer Council Victoria, Cancer Council NSW, Cancer Australia, and the National Breast Cancer Foundation. It has also been supported by the Breast Cancer Network Australia, the National Breast Cancer Foundation, Victoria Breast Cancer Research Consortium and was further supported by infrastructure provided by the Cancer Council Victoria and the University of Melbourne. We thank the Victorian Cancer Registry, BreastScreen Victoria, the Australian Mammographic Density Research Facility. TLN has been supported by Cure Cancer Australia Foundation through Cancer Australia Priority-Driven Collaborative Cancer Research Scheme (1159399). TLN and SL have been supported by Victorian Cancer Council Post-Doctoral Fellowships and grants from the Picchi Foundation, Victorian Comprehensive Cancer Centre. JLH is a NHMRC Senior Principal Research Fellow. MAJ and MCS are NHMRC Senior Research Fellows.
Disclosure of Potential Conflicts of Interest
GSD receives funding from Genetic Technologies Ltd for work unrelated to this study.
1(a) Cumulus versus Cirrocumulus
1(b) Cumulus versus Cirrus
1(c)Cirrocumulus versus Cirrus