Towards Clinical Prediction with Transparency: An Explainable AI Approach to Survival Modelling in Residential Aged Care ======================================================================================================================== * Teo Susnjak * Elise Griffin ## Abstract **Background** An accurate estimate of expected survival time assists people near the end of life to make informed decisions about their medical care. **Objectives** Use advanced machine learning methods to develop an interpretable survival model for older people admitted to residential age care. **Setting** A large Australasian provider of residential age care services. **Participants** All residents aged 65 years and older, admitted for long-term residential care between July 2017 and August 2023. **Sample size** 11,944 residents from 40 individual care facilities. **Predictors** Age category, gender, predictors related to falls, health status, co-morbidities, cognitive function, mood state, nutritional status, mobility, smoking history, sleep, skin integrity, and continence. **Outcome** Probability of survival at all time points post-admission. The final model is calibrated to estimate the probability of survival at 6 months post-admission. **Statistical Analysis** Cox Proportional Hazards (CoxPH), Elastic Net (EN), Ridge Regression (RR), Lasso, Gradient Boosting (GB), XGBoost (XGB) and Random Forest (RF) were tested in 20 experiments using different train/test splits at a 90/10 ratio. Model accuracy was evaluated with the Concordance Index (C-index), Harrell’s C-index, dynamic AUROC, Integrated Bier Score (IBS) and calibrated ROC analysis. XGBoost was selected as the optimal model and calibrated for time-specific predictions at 1,3,6 and 12 months post admission using Platt scaling. SHapley Additive exPlanations (SHAP) values from the 6-month model were plotted to demonstrate the global and local effect of specific predictors on survival probabilities. **Results** For predicting survival across all time periods the GB, XGB and RF ensemble models had the best C-Index values of 0.714, 0.712 and 0.712 respectively. We selected the XGB model for further development and calibration and to provide interpretable outputs. The calibrated XGB model had a dynamic AUROC, when predicting survival at 6-months, of 0.746 (95% CI 0.744-0.749). For individuals with a 0.2 survival probability (80% risk of death within 6-months) the model had a negative predictive value of 0.74. Increased age, male gender, reduced mobility, poor general health status, elevated pressure ulcer risk, and lack of appetite were identified as the strongest predictors of imminent mortality. **Conclusions** This study demonstrates the effective application of machine learning in developing a survival model for people admitted to residential aged care. The model has adequate predictive accuracy and confirms clinical intuition about specific mortality risk factors at both the cohort and the individual level. Advancements in explainable AI, as demonstrated in this study, not only improve clinical usability of machine learning models by increasing transparency about how predictions are generated but may also reveal novel clinical insights. SUMMARY BOX **Section 1: What is already known on this topic** * Existing models for estimating survival in aged care settings have been primarily based on prognostic indices which do not have advanced capabilities of machine learning approaches. * There is a notable absence of both machine learning and AI tools that provide high interpretability of models and their predictions in residential aged care settings, crucial for clinical decision-making. **Section 2: What this study adds** * Our study applies and demonstrates the utility of machine learning models for survival prediction in residential aged care settings, with a focus on the six month survival probabilities. * The study performs extensive experiments using numerous algorithms, and demonstrates how multiple tools can be used in concert to provide personalized and highly interpretable predictions that enable clinicians to discuss care preferences with patients and families in an informed manner. * This research sets a benchmark on how various AI technologies can be integrated with machine learning to offer effective solutions and greater transparency for clinical decision-making in aged care settings specifically, and predictive healthcare analytics more generally. Keywords * Survival analysis * geriatric care * clinical decision support * machine learning * explainable AI * residential aged care services * palliative care * TRIPOD ## 1 Introduction Predicting death is easy. Everybody will die. Estimating the precise probability of death for an individual within a specific time period is more difficult. An accurate estimate of expected survival time helps people choose treatments that align with their goals of care [1]. A falsely optimistic prognosis reduces the quality of death experienced by a patient and their loved ones [2]. Palliative care in people with a terminal diagnosis focuses on withdrawing treatments that cause pain or suffering and offering care that enhances the quality of remaining life. Many people are willing to endure short-term discomfort to increase survival time but there comes a point when sacrificing quality for quantity is no longer justifiable. For some, this realization comes just days before death, while for others, it may be recognized several months prior [3]. People entering residential aged care do not usually have a specific terminal illness. Rather, they are undergoing the inexorable decline in function that accompanies chronic illness and natural aging [4]. Shifting from an active treatment model to a palliative approach in this setting is a nuanced decision and is not always clearly communicated with residents or their families [5]. However, more than one-third of older people admitted to residential aged care will die within six months of admission [6]. In most healthcare settings, this prognosis would prompt discussions about the patient’s care preferences in view of their short life expectancy. Yet these vital conversations occur less frequently than many older people would prefer [7]. This study has two primary motivations. Firstly, we aim to employ advanced machine learning techniques to develop a reliable and accurate prognostic model for individuals entering residential aged care. By identifying those with a limited prognosis upon admission, we hope to empower healthcare providers with decision-support tools that aid transparent discussions with residents and their families about their end-of-life preferences. Secondly, our more expansive goal is to advocate for residential age care as a central provider of palliative services for those nearing the natural end of life. By emphasizing the limited life expectancy of people admitted to residential care and promoting open dialogue about it, we hope to enhance quality of care for all residents. ## 2 Background The development of prognostic models for mortality in gerontological research has been extensive, yet the application of these models to the narrower context of residential aged care facilities has not been as fully explored. However, scholarly efforts are advancing. A recent study [8] has systematically reviewed the use of predictive tools for all-cause mortality in older adults living in residential care. The work however focused only on multivariate prognostic indices. While these indices offer valuable insights and capabilities, they represent only a proportion of available tools. Amongst alternative tools, statistical approaches such as Cox proportional hazards models for instance, are also utilized, at times in concert with prognostic indices to enhance predictive accuracy. Conversely, the integration of machine learning models is still embryonic in the literature, indicating an underexplored frontier in prognostic research for this specific cohort of older people. ### Prognostic indices Flacker and Kiely [9] produced one of the earliest mortality scores for people in residential age care based on the Minimum Data Set (MDS), mandated in all US nursing homes for capturing clinical and administrative data. Forty-four of the 65 predictors they tested were associated with 1-year mortality in bivariate proportional hazards analysis, while eight were associated with 1-year mortality in a multivariate proportional hazards regression. They developed a mortality score by assigning points based on the risk ratio representing the association of each individual predictor with mortality. The authors expanded this work in 2003 to refine the mortality risk index score based on a prospective cohort study of 1,174 nursing home residents followed for 1 year or until death [10]. The most significant predictors of mortality were advanced age, male sex, low body mass index, severe cognitive impairment, dependence in activities of daily living, pressure ulcers, congestive heart failure, chronic obstructive pulmonary disease, and cancer. Porock et al. [11] also leveraged the MDS dataset to develop and test a prognostic model for 6-month mortality risk in elderly nursing home residents with the goal of creating a practical tool to guide end-of-life (EoL) care decisions. They used stepwise logistic regression to identify 14 MDS items related to functional, cognitive, and disease status, and adverse events that collectively predicted mortality with reasonable accuracy (C-statistic 0.75). A point-based scoring index termed the MDS Mortality Risk Index (MMRI) was derived, allowing estimation of 6-month survival across risk strata. In a subsequent study, Porock et al. [12] revised and simplified MMRI and created MMRI-R, based on 12 predictors from the MDS. The revised tool included items such as recent unintentional wight loss and poor appetite, as well as deteriorations in cognitive function. The MMRI-R included dehydration, shortness of breath and a score reflecting the activities of daily living, as well as the previously identified predictors like age, gender, active cancer diagnosis, renal failure, and chronic heart failure. More recently, Niznik et al. [13] have adapted and revised the MMRI-R. The updated aMMRI is reported as outperforming the original as well as other similar mortality risk tools in terms of discrimination (accuracy in predicting who will die) and calibration (accuracy of predicted probabilities). The revised tool reflects changes in the MDS dataset, with the key difference between the new and the previous versions of the tool centring around the two substitute items for poor appetite and a recent deterioration in cognition that are no longer available in the MDS. A prognostic tool for predicting 6-month mortality for nursing home residents with advanced dementia using the MDS dataset was also developed by Mitchell et al. [14]. The authors used Cox proportional hazards models to identify 12 relevant predictors (length of stay, age, gender, dyspnoea, pressure ulcers, total functional dependence, bedbound, insufficient oral intake, bowel incontinence, body mass index, weight loss, and congestive heart failure) which were significantly associated with 6-month mortality. They assigned point values to each selected variable according to the hazard ratios, and summed the points to create a risk score ranging from 0 to 19. They prospectively validated the performance of their Advanced Dementia Prognostic Tool (ADEPT), finding that the ADEPT score had modest discrimination (AUROC = 0.67) and good calibration compared with hospice eligibility guidelines, which showed poor discrimination (AUROC = 0.55) and low sensitivity (0.20) [15]. However, the ADEPT score did not perform significantly better than hospice guidelines when examined as a dichotomous measure using a cutoff with the same specificity. Recently, Ogarek et al. [16] revised the Changes in Health, End-Stage Disease, Signs and Symptoms (CHESS) Scale, widely use to predict mortality in nursing home residents. The revision was undertaken to align the scale with the current version of the MDS dataset (and the Medicare Master Beneficiary Summary File) and to ensure the tool was specific for predicting mortality in nursing home residents, since the original CHESS [17] was devised to predict mortality in people in complex continuing care (CCCs) hospitals. The updated CHESS scale ranges from 0 (most stable) to 5 (least stable) and includes indicators of end-stage disease, cognitive and physical impairment, acute mental status change, aggressive behaviour, impaired daily decision-making, and health conditions such as dehydration, pressure ulcers, swallowing disorder, respiratory failure, shortness of breath, and heart failure. The scale was found to be strongly associated with mortality, hospitalization, and successful discharge from nursing home care, in both new and long-stay nursing home populations. The authors suggest that the revised CHESS scale can be a valuable tool for risk adjustment, advance care planning, and identifying residents for hospice referral. ### Statistical approaches Cox proportional hazards regression has been used extensively to analyse survival in community-dwelling older people and those residing in aged care facilities [18, 19, 20, 21, 22, 23, 24, 25, 26]. This method, along with other similar standard statistical methods, are predominately used to identify key predictors of mortality during the development of prognostic indices.[27] These traditional statistical models, while using highly interpretable coefficients, are constrained by assumptions about data distributions and linearity, and often fail to account for interactions between variables. They depend heavily on domain expertise and usually utilise a small number of variables. In a recent review, Woodman and Mangoni [27] report that these constraints result in generalized population-level models primarily intended to determine the predictive value of risk factors and the mean risk for individuals possessing a specific combination of these factors. Consequently, these models can be less useful for personalized risk prediction and treatment recommendations, essential aspects of contemporary geriatric medicine [28]. Deardorff et al. [29] is one of the exceptions where the authors used Cox proportional hazards regression for mortality prognostication for older adults; however, their study considered community-dwelling adults with dementia. The study developed and externally validated the mortality prediction model and distilled a set of predictors encompassing demographic, health, behavioral, functional, and chronic condition variables. A clinically viable model was produced with a 1 to 10-year span, encapsulating nine significant predictors. Meanwhile, Rauh et al. [30] developed a predictive model for 14-day mortality in antibiotic-treated nursing home residents with dementia and pneumonia. The study used logistic regression, concluding that the prognostic model can be a useful tool to support decision-making for EoL care. Similarly, Falcone et al. [31] produced mortality-predictive models for clinical use in patients residing in nursing homes or long-term care facilities with a diagnosis of pneumonia where logistic regression models were used to predict 30-day mortality. ### Machine learning Machine learning techniques have the potential to overcome some of the limitations of traditional statistical approaches to survival analysis in geriatric populations, while increasing accuracy and clinical usability [32]. Machine learning methods are increasingly used in clinical prediction models. Spooner et al. [33] compared ten machine learning algorithms for predicting development of dementia, utilizing high-dimensional clinical data from the Sydney Memory and Ageing Study (MAS) and the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Their models demonstrated high concordance index values, signifying satisfactory accuracy in predicting dementia onset. In a similar context, Wang et al. [34] developed a deep learning model to predict mortality risk in people from a variety of settings with Alzheimer’s disease and related dementias, using longitudinal electronic health records from a nonprofit integrated healthcare system in Boston, Massachusetts. Their approach, using demographic information and clinical notes, demonstrates the potential of deep learning for improving mortality prediction. Evolving from traditional regression-based models to advanced machine learning models has the potential to not only improve prognostication, but also contribute to new clinical decision support tools. Machine learning algorithms accommodate many more variables than traditional statistical models. By accurately representing the complex interactions and non-linear relationships that exist in most clinical data, machine learning can generate deeper insights into underlying data relationships, potentially leading to the discovery of novel risk factors [35, 36, 33, 34]. Machine learning approaches in survival analysis, however, are not without limitations. They can be computationally demanding and possess “black box” characteristics, making their interpretability a significant challenge. This lack of transparency is a critical issue in healthcare, where understanding the rationale behind predictions is essential for clinical implementation. Shapley Additive exPlanations (SHAP) and other interpretable AI techniques have emerged as solutions to enhance the transparency of machine learning models. Effective use of SHAP values to identify and interpret the significance of various predictors in mortality risk predictions [34, 37] and in people with a dementia diagnosis [38] has been demonstrated. By providing transparency about the contribution of each predictor to prognostic prediction, these techniques known as Explainable AI (XAI), enhance the clinical utility of the model’s outputs. ### Objectives The literature on the development of mortality prognostic models in gerontological research indicates that the focus has historically leaned on multivariate prognostic indices and traditional statistical methods, especially within the specific context of residential aged care. These traditional approaches, such as the Cox proportional hazards models used alongside prognostic indices, have provided valuable results and insights. Recently, the application of machine learning techniques in this field has begun to emerge, signalling a shift towards more sophisticated tools and potentially improved predictive capabilities. A notable gap persists in the utilisation of machine learning models specifically in the aged care setting, where their application remains largely unexplored. Compounding this advancement, the advent of XAI tools bridges a crucial gap by demystifying machine learning models, thereby enhancing their clinical usability through increased transparency. In light of the gaps identified in the literature, the objectives of this study are to: 1. Establish the feasibility of developing robust survival models using data acquired about patients during their first 31 days of admission to long-term aged care facilities. 2. Determine the potential of various machine learning algorithms for survival predictions over various time horizons, aiming to identify the optimal algorithm in this context. 3. Calibrate the predictive models to accurately forecast survival probabilities at a six-month time horizon post-admission, facilitating the optimization of targeted palliative care strategies. 4. Demonstrate how explainable AI techniques increase the transparency and interpretability of predictive models, enhancing their utility in clinical decision-making. 5. Propose an integrated framework combining predictive modelling, model interpretation, calibrated forecasts and clinical decision principles to optimize the real-world application of the survival models. ### Contribution The contribution of this research is the development of a suite of predictive models that are methodologically robust and clinically actionable. This work addresses a gap in current literature on using machine learning models for predicting mortality for people living in residential aged care facilities. A key feature of our work is the integration of XAI techniques which expose the internals of “black box” models generated by machine learning algorithms. By quantifying the relative contribution of specific predictors to the prognosis of individual residents, these tools have the potential to increase confidence in the modelling among clinicians. Generating patient-specific survival curves together with the identification of salient risk factors for any individual admitted to residential-age care is a novel feature of this work. ## 3 Methods This study is reported according to the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) guideline [39]. ### 3.1 Data Source Healthcare data used in this study were collected during the provision of routine care to older individuals admitted for long-term care between 1st July 2017 and 30th August 2023 to facilities owned by a single large Australasian private residential aged care provider. ### 3.2 Participants Data from residents at 34 New Zealand and 6 Australian aged care facilities were included. Individuals were eligible for inclusion if they were admitted for long-term care on or after 1st July 2017. ### 3.3 Outcome The primary outcome is the survival probability of an individual resident at admission to a residential age care facility assessed in two ways. 1. A continuous survival curve showing survival probability at all time points up to six years post-admission 2. A point-estimate of the probability of survival at six-months post-admission ### 3.4 Predictors All predictors included in the model were selected from demographic and clinical data recorded by registered nurses during the initial clinical evaluation of newly admitted residents. Medication data were taken from the electronic medicine chart. The earliest instance of each specific predictor following admission was used. Any predictor recorded more than 31 days after admission was excluded. Our objective was to predict survival probability from the time of admission. Data collected more than one month post-admission was discarded. Tables 1 and 2 show the list of all predictors included in the model and the values assigned to each level. Table 1 details the demographic attributes of the study cohort, including reasons for discharge and Rx-Risk Co-morbidity Index diagnostic categories [40]. Table 2 reports clinical variables drawn from the initial nursing assessment. Domain expertise was used to assign ordinal values representing the degree of severity for each level of the predictor. High values represent a state associated with worse function or clinical status (higher mortality risk) and low values represent states associated with better function or clinical status (lower mortality risk). View this table: [Table 1:](http://medrxiv.org/content/early/2024/01/16/2024.01.14.24301299/T1) Table 1: Cohort demographic predictors, their frequencies and transformation values used for modelling. View this table: [Table 2:](http://medrxiv.org/content/early/2024/01/16/2024.01.14.24301299/T2) Table 2: List of predictors, together with their raw values, the assessment used to capture them, their distribution as well as their transformation for modelling. Predictors relating to nutrition, mobility, smoking, sleep, skin integrity and continence are from a standardised set of questions and answers based on the InterRAI long-term care facilities assessment [41]. Predictors relating to current health status, cognition, mood and pressure ulcers are drawn from validated InterRAI-based composite measures (respectively, the CHESS scale (Changes in Health, End-Stage Disease, Symptoms and Signs) [17, 16], cognitive performance scale [42], depression rating scale [43], pressure ulcer risk scale [44]). Co-morbidities are assessed in two ways. The presence of a specific diagnosis was established by filtering the diagnosis fields in the resident clinical record for terms that captured dementia of any type, ischemic cardiac disease and heart failure of any type, any malignant neoplasm, any non-cancer pulmonary disease and any form of diabetes, excluding glucose intolerance. The sum of the scores for these items (1 = ANY diagnosis of this type present, 0 = ALL diagnoses of this type absent), rather than the individual diagnosis, was used as a predictor in the final model (minimum value = 0, maximum value =5). The Rx-Risk Co-morbidity Index is included as a second comorbidity item. This index provides a weighted score for each specific diagnosis based on prescription data. The scores are summed for an individual resident to provide the final Rx-Risk score. The falls predictor was a bespoke question used by the provider about the frequency of falls in the past six months. ### 3.5 Sample Size Sample size was determined by the availability of data. Complete digital personal health records for all residents, including electronic medicine chart data, were available from July 1st 2017. We utilised all data from current and discharged long-term residents admitted on or after this date for model development. ### 3.6 Missing Data Missing data was encountered at varying degrees for most predictors, as reported in Table 2. Predictors with 75% or more missing values were excluded. The Multiple Imputation by Chained Equations (MICE) [45, 46] was used to impute missing values for all predictors, motivated by recent studies demonstrating this approach in the context of survival analyses [47, 48]. Table 2 reports the percentage of missing values for each included predictor. ### 3.7 Statistical Analysis Methods This study explored traditional algorithms2 comprising Lasso Regression [51], Ridge Regression [52], Elastic Net [53] and Cox Proportional Hazards [54] alongside machine learning approaches such as Gradient Boosting [55], XGBoost [56] and Random Forest [57]. During the preliminary stages of our data processing, a pairwise correlation coefficient threshold of 0.7 was used as a guide for eliminating highly correlated variables to ensure model parsimony and reduce multicollinearity [58]. The decision on which one of the variables to exclude from the model was made by examining data quality and completeness, the relative univariate predictive power of the variable, and potential interpretability. We standardized the data prior to use for the Lasso, Ridge, and Elastic Net algorithms. Before running experiments with these algorithms, hyperparameter tuning was executed using train/test splits. This process involved exploring a range of hyperparameters, guided by the goal of maximizing model performance3. Subsequent to hyperparameter tuning, each algorithm was tested in 20 experiments using different train/test splits at a 90/10 ratio. For algorithms requiring a validation set, the training set was further divided using a 90/10 split. Outcomes from these experiments were aggregated and presented with 95% confidence intervals. Model performance was assessed using various evaluation measures, detailed in Table 3. The multi-metric approach enables the evaluation of both uncalibrated and time-specific metrics and offers a comprehensive understanding of each model’s predictive accuracy, discriminative power, and reliability. These metrics serve both as individual model performance indicators and as tools for model comparison. The best-performing model was calibrated for time-specific predictions at a six-month time point using Platt scaling [59, 60]. The effectiveness of the calibration also entailed performing 20 train/test splits at a 90/10 ratio to evaluate the accuracy, which was visualized via a calibration plot and reported through Dynamic AUROC, IBS C-index and Harrell C-index. The specificity, sensitivity and negative predictive power (accuracy in forecasting mortality for the people who died within six months of admission) of this model were inspected via ROC curve analyses. View this table: [Table 3:](http://medrxiv.org/content/early/2024/01/16/2024.01.14.24301299/T3) Table 3: Overview of evaluation metrics used in the study. The final best-performing model from the analyses, its calibration together with an example code on how to use it, is publicly available from a GitHub repository4. ### 3.8 Ethical Considerations and Study Protocol Ethics approval for this study was granted by the Aotearoa Research Ethics Committee (formerly New Zealand Ethics Committee, NZEC22_11) and noted by the Human Ethics Committee (Ohu Matatika 2) of Massey University. Findings from the preliminary per protocol analyses were disappointing, therefore the study deviated from the protocol in order to expand the modelling scope to include mortality indicators beyond only falls and behavioural changes and have also included people without a dementia diagnosis. The research questions and the above ethics application were thus amended accordingly. ### 3.9 eXplainable AI Tools Balancing predictive strength and model interpretability is a specific challenge in contemporary machine learning research. As algorithms increase in complexity, the transparency and explainability of derived models diminish, creating potential concerns in settings, such as healthcare, where decisions based on model outputs must align with ethical and regulatory standards. eXplainable Artificial Intelligence (XAI) is an important line of research that addresses transparency in machine learning and comprises a suite of tools designed to expose the internal decision-making processes of advanced models. Of these tools, SHapley Additive exPlanations (SHAP), used in this study, is an exemplar of XAI methods [67, 68]. We report model behaviour from both *global* (cohort-level) and *local* (patient-level) interpretative dimensions. At a cohort-level, we provide SHAP Summary plots to show the ranked impact of predictors on survival probability and SHAP Dependence plots to illustrate the effect of interactions between predictors on predicted survival. These plots provide a *macroscopic* lens into the primary determinants of the model predictions. For patient-level analyses, we use SHAP Waterfall plots. These plots provide a granular examination of individual data points, detailing the contribution of each predictor to a specific prognosis. These plots provide patient-level information on the most relevant predictors in each individual, making the algorithms’ predictive processes transparent and increasing clinician confidence in the model output. ## 4 Results ### 4.1 Overview We present our results in two parts. In the first section, we report performance metrics for a variety of uncalibrated general models trained to predict survival probability at all time points up to six years post-admission. We use individualised survival curves and SHAP summary, dependence and waterfall plots to provide insight into the behaviour of the best-performing uncalibrated model. In the second part, we report performance metrics for a similar model, calibrated to predict survival probability at six months post-admission time. Finally, we present the clinical evaluation metrics of sensitivity, specificity, and negative predictive value for the calibrated model. ### 4.2 Participants Data from 12882 individuals were extracted from the database. We eliminated 407 individuals who lacked requisite assessment data within 31 days of admission from the cohort. We reconciled data from residents with one or more consecutive admissions, resulting in a cohort of 11945 unique individuals. Data from one resident was excluded due to a negative value length of stay value. The final cohort contained data from 11944 individuals. The mean age of people in the cohort was 86 years (SD 7) and the majority were women (n= 7200, 60%). Just over half the cohort (n = 6739, 56%) were discharged due to death (the modelled outcome) and approximately 30% were current residents (n = 3465). Three-quarters of residents had an electronic medicine chart initiated within 31 days of admission, allowing us to estimate the Rx-Risk Comorbidity Index for these individuals (n = 9065, 76%) [40]. The most common diagnostic categories based on prescription data were pain, a psychotic disorder (most likely behavioural and psychological symptoms of dementia), congestive heart failure and gastro-oesophageal reflux. The full distribution of Rx-Risk Comorbidity Index categories is reported in Table 1. ### 4.3 Model Performance Table 4 shows evaluation results of survival models across a time horizon of up to 74 months. The best-performing models, according to the C-index, are the ensemble methods, Gradient Boosting, Random Forest and XGBoost with negligible differences between them. These models exhibit C-indices of 0.712 to 0.714, supported by narrow 95% confidence intervals, indicating effective discriminatory power and robust statistical stability. The leading C-index of the top three models is complemented by their Harrell’s C-index score of *∼*0.67 and an AUROC of *∼*0.75, confirming effective performance in both discrimination and calibration. Only marginally lower, CoxPH, Ridge and Lasso regression exhibit similar performance on this dataset across all the metrics. Elastic Net performed significantly worse than all other candidate models. View this table: [Table 4:](http://medrxiv.org/content/early/2024/01/16/2024.01.14.24301299/T4) Table 4: Rank-ordered performance metrics of different models across the entire survival period up to 74 months post admission to a care facility showing 95% confidence intervals across the key metrics. ### Model Interpretability Here we examine the internal mechanics of the model and the impact of each predictor at the cohort level and at the individual patient level. For this analysis, we selected XGBoost5, one of the top performing models according to Table 4, for a more detailed inspection. The SHAP summary plot in Figure 1 presents the predictors included in the model ranked in order of importance (greatest influence on predicted survival at the top, least influence on predicted survival at the bottom). The SHAP values are shown on the x-axis. SHAP values quantify the contribution of each predictor to the model estimate of mortality risk, in deviation from the mean prediction. The grey vertical line represents a zero-impact mean prediction. Positive values (to the right of the zero-impact line) are associated with increased mortality risk and negative values (to the left of the zero-impact line) with reduced mortality risk. As the data points for each predictor move further from the vertical line, the greater the impact of this predictor on expected survival becomes. The colour spectrum (blue to red) across the SHAP value scatter shows the value of the predictor value, with blue indicating lower values and red signifying higher values. This relationship is most easily visualised in the ‘age_category’ and ‘chess_scale_score‘, where higher values (older age or worse health status respectively) are both red and associated with high positive SHAP values (large impact on the prediction of increased mortality risk). ![Figure 1:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/01/16/2024.01.14.24301299/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2024/01/16/2024.01.14.24301299/F1) Figure 1: Predictor importance summary plot for the XGBoost model. We also gain insights and witness the asymmetric effects that certain predictors and their values exert in influencing the final predicted risk scores. For instance, ‘rx\_risk\_score‘, ‘poor\_eating\_or\_lack\_of\_appetite’ and the ‘pressure\_ulcer\_risk\_score‘, exhibit a much stronger effect on elevating the predicted risk scores as their predictor values increase, while the reverse effect is smaller on reducing risk as their predictor values decrease. This is also in line with expectations, since for example, evidence of poor eating or a lack of appetite ought to have a greater effect on the model than a lack of evidence thereof. Other notable predictors are ‘specific\_health\_conditions‘and ‘cognitive\_performance\_scale\_score’ which are largely consistent in signalling that a deterioration (higher values) in these predictors tends to also increase the predicted risk. However, a less coherent signal accompanies the ‘depression\_rating\_scale\_score’, ‘skin\_integrity_score‘, ‘faecal_incontinence’ and ‘falls_history’ (investigated further below in dependence plots) with some signs of ambivalence with respect to predicted risk scores as the underlying predictor values change. A set of three Dependence Plots are depicted in Figure 2a-c, offering a deeper understanding of pairwise interactions between a selection of predictors. These visualizations depict how interactions of pairs of predictors influence the prediction of risk as their underlying values vary. As previously stated, these figures are based on the XGBoost model. For each of the selected predictors, the SHAP tool automatically selects the most interactive corresponding predictor. The x-axis represents the values for a chosen predictor, while the gradient colour bar on the y-axis represents the values of the counterpart predictor. The interaction of both is depicted with respect to the magnitude of the impact they exert on the final prediction. The dashed horizontal line represents a neutral effect on the model output. Points above this line indicate an increase in mortality risk, while the opposite holds for values below the dashed line. The relative distance from the dashed line indicates the magnitude of the effect exerted on the mortality risk. ![Figure 2:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/01/16/2024.01.14.24301299/F2.medium.gif) [Figure 2:](http://medrxiv.org/content/early/2024/01/16/2024.01.14.24301299/F2) Figure 2: Dependence plots showing pairwise interactions between a selection of predictors and how their interactions affect patient risk predictions by the XGBoost model. With an increasing ‘chess\_scale\_score’ in Figure 2a, the mortality risk gradually increases. For lower values of the ‘chess\_scale\_score‘, the interaction with increasing patient age tends to elevate overall risk. This relationship, however, does not seem to hold for higher values of the ‘chess\_scale\_score‘. A distinct pattern emerges for the ‘rx\_risk\_score’ predictor in Figure 2b as its values increase. A score of less than 10 for ‘rx\_risk\_score’ does not show a tendency to increase mortality risk. The interaction of lower values for this predictor with increasing values of frailty represented by the ‘chess\_scale\_score’ tends not to elevate risk. However, an inflexion point occurs from 10 onwards for the ‘rx\_risk\_score‘, at which point increasing values for both this predictor and the ‘chess\_scale\_score‘, interact to significantly elevate the mortality risk. Finally, Figure 2c shows the interaction between the ‘cognitive\_performance\_scale\_score’ and the ‘poor\_eating\_or\_lack\_of\_appetite’ values. The ‘cognitive\_performance\_scale\_score’ tends to have the highest effect on elevating risk for the lowest and highest scores of this predictor. Overall, high ‘poor\_eating\_or\_lack\_of\_appetite’ values seem to have a larger interaction effect on increasing risk with lower values of ‘cognitive\_performance\_scale_score‘. ### Clinical usage and application The integration of survival analysis models into clinical practice is pivotal for informed medical decision-making. Here, we transition from theoretical modeling and inspection of a model from a high-level to a real-world application, using data from two anonymised patients as exemplars. The survival probability curve derived from the uncalibrated model and shown in Figure 3, illustrates each patient’s predicted survival trajectory in comparison to the cohort average. The figure indicates that the survival probabilities for patient B are significantly lower than those of patient A, and well below the cohort average across the entire timeframe of potential observation. This initial output allows clinicians to gauge individual patient risk in the context of broader population trends. However, while the initial outputs are useful, additional insights are needed to unpack how and why the model is arriving at different risk profiles for specific patients. ![Figure 3:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/01/16/2024.01.14.24301299/F3.medium.gif) [Figure 3:](http://medrxiv.org/content/early/2024/01/16/2024.01.14.24301299/F3) Figure 3: Patient survival function from the Gradient Boosting model depicting the risk of two exemplars versus the cohort. Subsequently, the utilization of SHAP waterfall plots seen in Figure 4a-b offers patient-level model interpretability. These plots reveal what the key predictors and their values are and how they influence the model’s survival predictions for each patient. The emphasis here is on practicality: enabling clinicians to comprehend the underpinnings of the model’s output, ensuring that its insights can be validated, trusted and ultimately integrated into tailored patient management strategies. Through this approach, we demonstrate the confluence of advanced analytical tools with clinical utility, underscoring their role in optimizing patient care. These plots are best interpreted from bottom-up. The y-axis shows the most impactful predictors with their values, and their relative contributions. The starting point on the x-axis is the average or the expected risk for the whole cohort. Each predictor pushes the risk to the left (to lower risk) or to the right (to increase risk) until all contributions are summed at the top row. For patient A in Figure 4a, we can see that the patient’s result on the use of mobility equipment increases their risk; however, their overall risk is significantly lowered by their independent mobilisation, low pressure ulcer risk score, low number of ongoing health conditions and their female gender. Meanwhile, for patient B in Figure 4b, it can be observed that their high risk is predominately driven by their high prescription risk score, limited mobility as well as their pressure ulcer risk score. ![Figure 4:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/01/16/2024.01.14.24301299/F4.medium.gif) [Figure 4:](http://medrxiv.org/content/early/2024/01/16/2024.01.14.24301299/F4) Figure 4: Waterfall plots from the XGBoost model showing interpretability for patient-level predictions using two hypothetical examples. ### 4.4 Calibrated Time-specific Survival Models In an evaluation of survival models tailored for different forecast horizons, the calibration plots with 95% confidence intervals for the target 6-month period (seen in Figure 5) yields insights into the model’s predictive accuracy. The plot features a calibration curve approximating the ideal 45-degree line, a sign of near-optimal calibration, where overall, the depicted model exhibits effective calibration from low to mid-range probabilities, while increased uncertainties around higher predicted survival probabilities (sub 0.8 probability) can also be seen. Beneath the calibration curve lies a histogram depicting the distribution of the predicted probabilities. The histogram intimates the density of predictions at different probability ranges. The shape of the distribution resembles a normal distribution with an albeit more pronounced tail for the lower probabilities, and a mean centred *∼*0.4. Additionally, the calibrated model was assessed by Hosmer-Lemeshow statistic test. The Hosmer-Lemeshow test yielded a chi-square value of 12.6 and a p-value of 0.128, indicating satisfactory model calibration. This statistical result suggests a reasonable congruence between the predicted probabilities and the observed outcomes, implying that the model neither significantly underfits nor overfits the data. Therefore, in summary, both the visual inspection and the results obtained by the Hosmer-Lemeshow test are indicative of the model being effectively calibrated. ![Figure 5:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/01/16/2024.01.14.24301299/F5.medium.gif) [Figure 5:](http://medrxiv.org/content/early/2024/01/16/2024.01.14.24301299/F5) Figure 5: 6-month Gradient Boosting model calibration plot contrasting a perfectly calibrated pattern (dashed line) with the actual model, together with the distribution of the predicted probabilities. The performance metrics presented in Table 5 offer a multi-perspective evaluation of Gradient Boosting models tailored for survival analysis across varying temporal horizons with respect to a range of metrics. For completion and comparisons, the accuracies of 1-, 3-6- and 12-month calibrated models are shown. The Dynamic AUROC serves as an emblematic metric for assessing a model’s discriminative capability. The observed downward trend in AUROC values as we move from short-term to longer-term forecasts is indicative of an interplay between model sensitivity and the inherent heterogeneity of patient trajectories over time. A declining AUROC is often attributed to the increased stochasticity of long-term forecasts, as can be seen in the table when contrasting the 1- and 12-month forecast accuracies. View this table: [Table 5:](http://medrxiv.org/content/early/2024/01/16/2024.01.14.24301299/T5) Table 5: Performance metrics of calibrated Gradient Boosting models for time-specific forecasts. Forecast The IBS serves as a gauge for model calibration. Though the observed trend of declining AUROC values over increasing prediction horizons aligns with the prevailing literature, signaling a diminishing discriminative power for long-term forecasts, intriguingly the model’s IBS values improve (decrease) concurrently. This is counterintuitive given the conventional wisdom that long-term forecasts usually suffer from poor calibration. This paradox can however be explained through the lens of the bias-variance tradeoff: the results suggest that as the models become better calibrated over time, their variance reduces, thereby increasing bias and consequently reducing the models’ discriminative power. The C-index and Harrell’s index are used for their robustness in quantifying a model’s ability to correctly rank-order individual risks. The results indicate the models’ stability with respect to this across all forecast horizons. This is in contrast to the Dynamic AUROC, which demonstrates a deterioration as the prediction window extends. The stability in the C-index and Harrell’s C-index could be indicative of the model’s preserved efficacy in ranking the relative risk between individuals over time, even as its ability to separate the classes of events and non-events diminishes (evidenced by the declining AUROC). This points to an interplay between how the model assigns ordinal ranks to individual survival probabilities versus its performance in the classification of events. Thus, the observed stability in C-index and Harrell’s C-index adds a layer of confidence in the model’s utility for tasks that require risk stratification over dichotomous classification, a distinction that has important implications in a clinical setting. The results of the calibrated Gradient Boosting model exhibits attributes that are tied to the temporal granularity of its prognostic estimates. Although the Dynamic AUROC, a traditional metric of discriminative power, reveals a time-sensitive attenuation, this need not be misconstrued as a universal decline in model efficacy. Importantly, the model’s calibration, captured through the IBS, and its discriminatory consistency, as evidenced by stable C-index and Harrell’s C-index metrics remain largely unaltered across varying forecast horizons. This observed dichotomy between discriminative power and risk-ranking capacity necessitates a departure from monolithic evaluation frameworks. It underscores the imperative for a multi-metric paradigm that captures the multi-dimensional attributes of survival models. ### 4.5 Clinical Validity The ROC curve illustrated in Figure 6 for the Gradient Boosting model offers an empirical framework for clinicians to discern patients at a heightened risk of mortality within 6 months post-admission to long-term care facilities. It illustrates the balance between specificity (true negative rate, TNR) and sensitivity (true positive rate, TPR) achieved by varying the prediction threshold. The figure highlights the clinical implications of adopting a 0.2 survival probability threshold, equating to an 80% risk of death within the specified period, thereby guiding interventions. Performance metrics presented are derived from validation on a separate dataset, ensuring a robust appraisal of the model’s predictive capabilities. * **Sensitivity / True Positive Rate (TPR)**: This metric quantifies the model’s ability to identify actual survivors, with a threshold of 0.2 yielding a 95% TPR. This means that 95% of patients who survive beyond six months are accurately predicted by the model. * **Negative Predictive Value (NPV)**: NPV assesses the accuracy of the model in predicting non-survival. At a threshold of 0.2, the NPV is 74%, indicating that among those predicted not to survive, 74% did not survive past six months. * **False Positive Rate (FPR)**: FPR reflects the proportion of non-survivors incorrectly predicted as survivors. With a FPR of 71%, the model erroneously predicts survival in 71% of cases where the patient does not survive six months. * **Specificity/True Negative Rate (TNR)**: This metric measures the model’s precision in identifying nonsurvivors. Although a specificity of 0.29 is low, this level of caution is appropriate in a situation where under-estimating the likelihood of death within six months carries potentially less clinical and ethical risk than an over-estimation of risk. ![Figure 6:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/01/16/2024.01.14.24301299/F6.medium.gif) [Figure 6:](http://medrxiv.org/content/early/2024/01/16/2024.01.14.24301299/F6) Figure 6: ROC curve of the calibrated Gradient Boosting 6-month model predicting survival probabilities. An example operational threshold of 0.2 for predicting patient survival is highlighted as a practically useful decision point from a clinical perspective. ## 5 Discussion We are not the first investigators to develop prognostic tools for people admitted to residential aged care. The work we present here builds on and extends the work of many others [16, 12, 10, 15, 14, 13, 69]. Our contribution has been to use advanced machine learning algorithms and eXplainable AI on a very large set of standardised health data to generate a clinically useful decision support tool. We have demonstrated the feasibility of developing a survival model based on data acquired at the time of admission to residential care. People admitted to residential care have complex medical and nursing requirements and initial clinical assessments can take some time to complete, so we included data acquired up to 1 month post-admission. Our cohort included people admitted over a six-year period and the digital health record evolved and expanded during this time, with new assessments added to meet new regulatory and clinical requirements. Consequently, we struck the issue of substantial amounts of missing data for potentially useful variables we might have wished to include, a common problem in “real life” data sets, as opposed to data prospectively collected for research purposes. The predictors in our final model thus reflect a pragmatic balance struck between including the most consistently recorded and most relevant clinical details and our desire to retain the maximal amount of training data. We investigated the efficacy of a variety of traditional and machine learning approaches by conducting rigorous repeated experiments with seven different algorithms, using test-training splits to minimise over-fitting, having evaluated all models using appropriate performance metrics. We found minimal differences between the top-performing models. The three machine learning ensemble models (GB, RF and XGB) consistently outperformed other algorithms on most evaluations. CoxPH outperformed other traditional statistical methods for generating survival curves, but was not as high-performing as the ensemble models. The evaluation metrics for these models confirm satisfactory discriminatory power and robust statistical stability at a level of accuracy that matches or exceeds other prognostic models developed in residential aged-care populations [70, 71, 8]. We used an uncalibrated XGB model to generate a continuous function showing survival probability at every time point up to six years post-admission. This type of survival curve, generated for the entire cohort, for selected categories within the cohort (by age, gender, or CHESS score for example), or for individual patients, is familiar to most clinicians. Overlaying a survival curve for an individual resident with the survival curve for their cohort of peers creates an instantly usable visual aid that could be used to inform discussions about prognosis with a patient or their family, while avoiding being overly definitive or confronting. We then calibrated the top-performing model (GB) to generate a point-estimate of survival probability at six months postadmission and integrated a closely related uncalibrated variant model (XGB) with SHAP tools to produce visualizations that illuminate the internal decision-making processes within the model and the complex interactions between predictors. We calibrated to six months as it is a prognostic time frame where open discussions about patient preferences for end-of-life care become appropriate and necessary, as reflected in the recommendation to use the InterRAI palliative care assessment for people in residential care with a life expectancy of six months or less [72]. Funding for palliative services is usually limited to people with a terminal diagnosis who are expected to die within six months. This type of funding is rarely accessed by people admitted to residential aged care as most do not have a specific terminal diagnosis at any stage of their admission. More accurate prognostication in these individuals has the potential to address an equity issue by improving their access to appropriately funded palliative care services. Health professionals working in aged care are often reluctant to discuss prognosis with residents and their families for a variety of reasons despite the fact that more than one-third of people admitted for long-term care die within six months of their arrival. Many of these individuals are poorly served by our failure to openly acknowledge their limited life expectancy and are subjected to treatments that neither extend nor enhance the quality of their remaining life. They and their families are often inadequately prepared for death, resulting in a traumatic terminal experience for the resident and complex grief in the survivors. In an ideal world, sensitive but realistic conversations about prognosis and expected goals of care would occur with every person and their family as a routine part of their admission to aged care. In this study, we have applied the advanced analytical techniques offered by machine learning and XAI to create useful visual aids that could support these conversations. Ultimately, however, the value of any tool is only realised in its application. Decision support tools increase patient autonomy and enhance clarity in healthcare discussions, but only if healthcare providers choose to use them. ### 5.1 Study limitations The study acknowledges several limitations. The dataset used contained a significant amount of missing data across various predictors. Despite employing sophisticated imputation methods, the high proportion of missing values for some predictors creates uncertainty about their true values and the resultant effect on the models. Specifically, this contributed to ambiguous signals seen in some variables in the SHAP analyses, hampering model interpretability. The study’s focus on cross-sectional data captured at the time of admission also limits its predictive capacity. New onset falls or increased fall frequency and temporal changes in functional capacity, appetite, mobility and cognitive function are more likely to be predictive of mortality than the absolute values of these variables at a single time point. Homogeneity of the patient cohort, sourced from a single private aged care provider, likely restricts the model’s applicability to settings with more varied patient demographics or different clinical assessment protocols. Moreover, the study establishes technical efficacy but lacks an assessment of real-world clinical implementation factors. Pragmatic clinical trials are imperative to evaluate the proposed models’ perceived utility among staff end-users and tangible impacts on workflows and decision-making prior to actual deployment in day-to-day clinical practice. User-centred design principles could help optimize the integration and presentation of model insights at the point of care. ### 5.2 Future work Incorporating dynamic time-varying covariates into risk forecasts could improve accuracy, especially for predictions taking place after the initial admission time point. Expanding the feature space with descriptive variables capturing changes in patients’ underlying conditions over time may also add valuable revisions to patient risk trajectories. Additionally, testing transportability across diverse datasets is valuable to confirm the generalizability of the models. Conducting clinical trials would provide useful real-world validation of the models’ acceptability, trustworthiness, and measurable impacts on healthcare providers and patients. To advance prognostic models in aged care, future work should focus on integrating dynamic, time-varying covariates for more accurate predictions and broadening data features to reflect real-time changes in patient conditions. Validating these models across diverse datasets and through clinical trials is critical for ensuring their generalizability and realworld utility. Additionally, exploring new data types such as genomic biomarkers and medical imaging, and employing advanced techniques such as natural language processing, may uncover novel prognostic insights. While this study establishes a methodological foundation on the use of machine learning and interpretable AI tools, important work remains in translating prognostic modelling advances into safe, effective, and patient-centred clinical decision support tools that measurably improve end-of-life care delivery. ## 6 Conclusion This study is one of the first to develop machine learning models to predict survival probabilities for a general population of adults in residential aged care facilities. This work conducted extensive experiments using numerous techniques on a large dataset as well as a unique set of predictors, and demonstrated the feasibility of developing robust predictive survival models in this setting which can be used by clinicians for decision-making around appropriately targeted palliative care options. The use of advanced explainable AI (XAI) techniques was also demonstrated, showing how transparency and interpretability of the models and their outputs can be realised to render machine learning suitable for clinical decision-making, where trust in the AI-driven prognostic tools is enhanced. Predictive models were calibrated for multiple time horizons, with an emphasis placed on the six-month survival probabilities post-admission to a residential aged care facility. A unique and readily available set of predictors was used in the models, introducing both novelty but also general portability of the models to different contexts and facilities. TRIPOD reporting was adhered to and both the model parameters and code have been made publicly available. The proposed predictive framework, comprising the models and AI tools represents a significant step forward, offering a comprehensive approach to AI-driven healthcare for survival analysis in residential aged-care contexts and beyond. ### Competing Interests The authors have completed the ICMJE uniform disclosure form at [www.icmje.org/coi\_disclosure.pdf](http://www.icmje.org/coi_disclosure.pdf) and declare that no support from any organisation has been received for the submitted work. Neither are there financial relationships with any organisations that might have had an interest in the submitted work in the previous three years, nor other relationships or activities that could appear to have influenced the submitted work in any way. ### Transparency Declaration The lead author declares that the manuscript has been composed with integrity, ensuring an honest, accurate, and transparent portrayal of the research conducted. The lead author confirms that we have not omitted any significant aspects of our study. ### Funding The authors declare that no funding was provided to conduct this research. ### Patient and Public Involvement Statement Patients and the public were not directly involved in this study. ### A Supplementary Dependence Plots We see in Figure 7a that as the number of ‘specific\_health\_conditions’ increases, the overall risk follows at an acute rate. Despite high mobilization requirements in the absence of ‘specific\_health\_conditions‘, there is no elevation in risk. As soon as ‘specific\_health\_conditions’ are observed, higher associated mobilisation requirements tend to also elevate the risk. The patterns from the ‘falls_history’ predictor are less clear in Figure 7b. The x-axis represents the history of falls categorized into four discrete values: ‘0’ representing for 1 or no history of falls, ‘1’ for 4 or less in the last 6 months, ‘2’ for 5 or more in the last 6 months, and ‘3’ for 3 or more falls in a one-month period. Risk is high for patients having 1 or no history of falls in cases where their ‘poor\_eating\_or\_lack\_of\_appetite’ values are high. This relationship however does not hold as the frequency of the reported falls also increases. Meanwhile, the variance in the effect that the falls have on the risk disperses more greatly as falls increase, with an ambivalent relationship emerging with the ‘poor\_eating\_or\_lack\_of\_appetite’ predictor. Figure 7c, a clear and gradual rise in the risk can be observed for each increase in the ‘pressure\_ulcer\_risk\_score‘, which is only slightly amplified with increasing values in the number of ‘specific\_health_conditions‘. ![Figure 7:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/01/16/2024.01.14.24301299/F7.medium.gif) [Figure 7:](http://medrxiv.org/content/early/2024/01/16/2024.01.14.24301299/F7) Figure 7: Dependence plots showing pairwise interactions between a selection of predictors and how their interactions affect patient risk predictions by the XGBoost model. ### B Kaplan-Meier Estimates The Kaplan-Meier estimator serves as a cornerstone in survival analysis, providing a non-parametric method to estimate survival probability over time. This estimator is particularly adept at handling censored data, allowing for the accommodation of individuals whose event outcomes are unknown within the study period. In essence, it paints a statistical picture of the probability that a subject in a study will ‘survive’ beyond a certain time, despite not all subjects necessarily reaching the event endpoint. In Figures 8a-h, the Kaplan-Meier survival curves delineate the impact of various clinical and demographic predictors on survival probabilities within an aged care cohort. The survival function for the study cohort as a whole (Figure 8a) displays a standard declining curve over time, setting a benchmark against which the influence of other variables can be measured. When disaggregated by the number of specific health conditions (Figure 8b), the survival curves illustrate a clear trend: as the number of conditions increases, survival probabilities correspondingly diminish, showcasing the cumulative effect of comorbidities on patient survival. ![Figure 8:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/01/16/2024.01.14.24301299/F8.medium.gif) [Figure 8:](http://medrxiv.org/content/early/2024/01/16/2024.01.14.24301299/F8) Figure 8: Survival functions for the cohort (a) and a selecti2o5n of key predictors (b-h) calculated using the Kaplan-Meier estimator. The falls history (Figure 8c) is another predictor of interest, stratified into discrete categories based on the reported frequency of falls within specific time frames as defined in Table 2. As the frequency of reported falls increases, risk tends to also elevate. Gender differences are also evident (Figure 8d), with male and female survival curves diverging, reflecting the inherent differences in survival rates often observed between genders in epidemiological studies. Similarly, the method of mobilization (Figure 8e) shows a gradation of survival probabilities with increased mobilization requirements, emphasizing the importance of functional mobility as a determinant of outcomes in aged care settings. The Pressure Ulcer Risk Scale (Figure 8f), the Rx Risk Score (Figure 8g), and the CHESS scale score survival functions (Figure 8h) each offer a granular view of how clinical assessments and risk scores can predict patient survival. The CHESS scale, which evaluates health instability, particularly highlights the prognostic value of capturing and monitoring changes in a patient’s health status. ## Data Availability The data is confidential; however, the predictive model has been made available online with URL links provided in the article. [https://github.com/teosusnjak/survival-analysis-stage1](https://github.com/teosusnjak/survival-analysis-stage1) ## Footnotes * 2 The implementations of the algorithms from the Python library scikit-survival [49] version 0.21.0 were used and XGBoost [50] version 1.7.6. * 3 Interested readers can refer to the GitHub repository [https://github.com/teosusnjak/survival-analysis-stage1](https://github.com/teosusnjak/survival-analysis-stage1) for complete implementation details * 4 [https://github.com/teosusnjak/survival-analysis-stage1](https://github.com/teosusnjak/survival-analysis-stage1) * 5 While Gradient Boosting was technically the top performing model, both it and XGBoost are essentially the same algorithm with slightly different implementations. The implementation of XGBoost, however, lends itself better for interpretability analysis given its integration with SHAP tools. * Received January 14, 2024. * Revision received January 14, 2024. * Accepted January 16, 2024. * © 2024, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. [1]. Jane C Weeks, et al. “Relationship between cancer patients’ predictions of prognosis and their treatment preferences”. In: Jama 279.21 (1998), pp. 1709–1714. 2. [2]. Nicholas A Christakis et al. “Extent and determinants of error in doctors’ prognoses in terminally ill patients: prospective cohort studyCommentary: Why do doctors overestimate? Commentary: Prognoses should be based on proved indices not intuition”. In: Bmj 320.7233 (2000), pp. 469–473. 3. [3]. Alexi A Wright, et al. “Associations between end-of-life discussions, patient mental health, medical care near death, and caregiver bereavement adjustment”. In: Jama 300.14 (2008), pp. 1665–1673. 4. [4]. Melanie Luppa, et al. “Prediction of institutionalization in the elderly. A systematic review”. In: Age and ageing 39.1 (2010), pp. 31–38. 5. [5]. Maho Omori, et al. “The language of dying: Communication about end-of-life in residential aged care”. In: Death Studies 46.3 (2022), pp. 684–694. 6. [6]. Anne Kelly et al. “Length of stay for older adults residing in nursing homes at the end of life”. In: Journal of the American Geriatrics Society 58.9 (2010), pp. 1701–1706. 7. [7]. Tim Sharp et al. “Do the elderly have a voice? Advance care planning discussions with frail and older individuals: a systematic literature review and narrative synthesis”. In: British Journal of General Practice 63.615 (2013), e657–e668. 8. [8]. Shengruo Zhang, et al. “Prediction models of all-cause mortality among older adults in nursing home setting: A systematic review and meta-analysis”. In: Health Science Reports 6.6 (2023), e1309. 9. [9]. Jonathan M. Flacker and Dan K. Kiely. “A Practical Approach to Identifying Mortality-Related Factors in Established Long-Term Care Residents”. In: Journal of the American Geriatrics Society 46.8 (1998), pp. 1012– 1015. DOI: 10.1111/j.1532-5415.1998.tb02759.x. eprint: [https://agsjournals.onlinelibrarywileycom/doi/pdf/10.1111/j.1532-5415.1998.tb02759.x](https://agsjournals.onlinelibrarywileycom/doi/pdf/10.1111/j.1532-5415.1998.tb02759.x). URL:[https://agsjournals.onlinelibrary.wiley.com/doi/abs/10.1111/j.1532-5415.1998.tb02759.x](https://agsjournals.onlinelibrary.wiley.com/doi/abs/10.1111/j.1532-5415.1998.tb02759.x). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1532-5415.1998.tb02759.x&link_type=DOI) 10. [10]. Jonathan M Flacker and Dan K Kiely. “Mortality-related factors and 1-year survival in nursing home residents”. In: Journal of the American Geriatrics Society 51.2 (2003), pp. 213–221. 11. [11]. Davina Porock, et al. “Predicting death in the nursing home: development and validation of the 6-month Minimum Data Set mortality risk index”. In: The Journals of Gerontology Series A: Biological Sciences and Medical Sciences 60.4 (2005), pp. 491–498. 12. [12]. Davina Porock, et al. “The MDS Mortality Risk Index: The evolution of a method for predicting 6-month mortality in nursing home residents”. In: BMC research notes 3 (2010), pp. 1–8. 13. [13]. Joshua D Niznik et al. “Adaptation and initial validation of Minimum Data Set (MDS) mortality risk index to MDS version 3.0”. In: Journal of the American Geriatrics Society 66.12 (2018), pp. 2353–2359. 14. [14]. Susan L Mitchell, et al. “Estimating prognosis for nursing home residents with advanced dementia”. In: Jama 291.22 (2004), pp. 2734–2740. 15. [15]. Susan L Mitchell, et al. “Prediction of 6-month survival of nursing home residents with advanced dementia using ADEPT vs hospice eligibility guidelines”. In: Jama 304.17 (2010), pp. 1929–1935. 16. [16]. Jessica A Ogarek et al. “Minimum Data Set Changes in Health, End-stage disease and Symptoms and Signs Scale: a revised measure to predict mortality in nursing home residents”. In: Journal of the American Geriatrics Society 66.5 (2018), pp. 976–981. 17. [17]. John P Hirdes, et al. “Use of the interRAI CHESS scale to predict mortality among persons with neurological conditions in three care settings”. In: PloS one 9.6 (2014), e99066. 18. [18]. Johane P Allard et al. “Nutrition risk factors for survival in the elderly living in Canadian long-term care facilities”. In: Journal of the American Geriatrics Society 52.1 (2004), pp. 59–65. 19. [19]. Kathryn L. Hicks, Peter V. Rabins, and Betty S. Black. “Predictors of Mortality in Nursing Home Residents With Advanced Dementia”. In: American Journal of Alzheimer’s Disease & Other Dementias® 25.5 (2010). pmid:PMID: 20601644, pp. 439–445. DOI: 10.1177/1533317510370955. eprint: https://doi.org/10.1177/ 1533317510370955. URL: https://doi.org/10.1177/1533317510370955. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/1533317510370955&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20601644&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F01%2F16%2F2024.01.14.24301299.atom) 20. [20]. Hugo Lopes, Céu Mateus, and Nicoletta Rosati. “Impact of long term care and mortality risk in community care and nursing homes populations”. In: Archives of Gerontology and Geriatrics 76 (2018), pp. 160–168. ISSN: 0167-4943. DOI: 10.1016/j.archger.2018.02.009. URL: [https://www.sciencedirect.com/science/article/pii/S0167494318300281](https://www.sciencedirect.com/science/article/pii/S0167494318300281). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.archger.2018.02.009&link_type=DOI) 21. [21]. Mette Reilev et al. “Morbidity and mortality among older people admitted to nursing home”. In: Age and Ageing 49.1 (Nov. 2019), pp. 67–73. ISSN: 0002-0729. DOI: 10. 1093 / ageing / afz136. eprint: [https://academic.oup.com/ageing/article-pdf/49/1/67/31674536/afz136.pdf](https://academic.oup.com/ageing/article-pdf/49/1/67/31674536/afz136.pdf). URL: doi:10.1093/ageing/afz136. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ageing/afz136&link_type=DOI) 22. [22]. Alicia Padrón-Monedero et al. “Falls and long-term survival among older adults residing in care homes”. In: PLOS ONE 15.5 (May 2020), pp. 1–14. DOI: 10.1371/journal.pone.0231618. URL: 10.1371/journal.pone.0231618. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0231618&link_type=DOI) 23. [23]. José Fermín García-Gollarte et al. “Risk Factors for Mortality in Nursing Home Residents: An Observational Study”. In: Geriatrics 5.4 (2020). ISSN: 2308-3417. DOI: 10.3390/geriatrics5040071. URL: [https://www.mdpi.com/2308-3417/5/4/71](https://www.mdpi.com/2308-3417/5/4/71). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3390/geriatrics5040071&link_type=DOI) 24. [24]. Alexia Charles et al. “Physical performance trajectories and mortality among nursing home residents: results of the SENIOR cohort”. In: Age and Ageing 49.5 (Mar. 2020), pp. 800–806. ISSN: 0002-0729. DOI: 10.1093/ ageing/afaa034. eprint: [https://academic.oup.com/ageing/article-pdf/49/5/800/33676931/afaa034.pdf](https://academic.oup.com/ageing/article-pdf/49/5/800/33676931/afaa034.pdf). URL: doi:10.1093/ageing/afaa034. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ageing/afaa034&link_type=DOI) 25. [25]. Marit Mjørud et al. “Time from dementia diagnosis to nursing-home admission and death among persons with dementia: A multistate survival analysis”. In: PLOS ONE 15.12 (Dec. 2020), pp. 1–15. DOI: 10.1371/journal.pone.0243513. URL: 10.1371/journal.pone.0243513. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0243513&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=WOS:00059715&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F01%2F16%2F2024.01.14.24301299.atom) 26. [26]. Anna Kańtoch, et al. “Explanatory survival model for nursing home residentsa 9-year retrospective cohort study”. In: Archives of Gerontology and Geriatrics 97 (2021), p. 104497. ISSN: 0167-4943. DOI: 10.1016/j.archger.2021.104497. URL: [https://www.sciencedirect.com/science/article/pii/S0167494321001606](https://www.sciencedirect.com/science/article/pii/S0167494321001606). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.archger.2021.104497&link_type=DOI) 27. [27]. Richard J Woodman and Arduino A Mangoni. “A comprehensive review of machine learning algorithms and their application in geriatric medicine: present and future”. In: Aging Clinical and Experimental Research (2023), pp. 1–35. 28. [28]. Christopher R Martens, Devin Wahl, and Thomas J LaRocca. “Personalized medicine: will it work for decreasing age-related morbidities?” In: Aging. Elsevier, 2023, pp. 683–700. 29. [29]. W James Deardorff, et al. “Development and external validation of a mortality prediction model for communitydwelling older adults with dementia”. In: JAMA Internal Medicine 182.11 (2022), pp. 1161–1170. 30. [30]. Simone P Rauh et al. “Predicting Mortality in Nursing Home Residents With Dementia and Pneumonia Treated With Antibiotics: Validation of a Prediction Model in a More Recent Population”. In: The Journals of Gerontology: Series A 74.12 (Nov. 2018), pp. 1922–1928. ISSN: 1079-5006. DOI: 10.1093/gerona/gly260. eprint: [https://academic.oup.com/biomedgerontology/article-pdf/74/12/1922/30703884/gly260.pdf](https://academic.oup.com/biomedgerontology/article-pdf/74/12/1922/30703884/gly260.pdf). URL: 10.1093/gerona/gly260. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/gerona/gly260&link_type=DOI) 31. [31]. M. Falcone et al. “Predictors of mortality in nursing-home residents with pneumonia: a multicentre study”. In: Clinical Microbiology and Infection 24.1 (2018), pp. 72–77. ISSN: 1198-743X. DOI: 10.1016/j.cmi.2017.05.023. URL: [https://www.sciencedirect.com/science/article/pii/S1198743X17302835](https://www.sciencedirect.com/science/article/pii/S1198743X17302835). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cmi.2017.05.023&link_type=DOI) 32. [32]. Robert T Olender, Sandipan Roy, and Prasad S Nishtala. “Application of machine learning approaches in predicting clinical outcomes in older adults–a systematic review and meta-analysis”. In: BMC geriatrics 23.1 (2023), p. 561. 33. [33]. Annette Spooner, et al. “A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction”. In: Scientific reports 10.1 (2020), p. 20410. 34. [34]. Liqin Wang, et al. “Development and validation of a deep learning algorithm for mortality prediction in selecting patients with dementia for earlier palliative care interventions”. In: JAMA network open 2.7 (2019), e196972–e196972. 35. [35]. Ravi B. Parikh et al. “Machine Learning Approaches to Predict 6-Month Mortality Among Patients With Cancer”. In: JAMA Network Open 2.10 (Oct. 2019), e1915997–e1915997. ISSN: 2574-3805. DOI: 10.1001/jamanetworkopen.2019.15997. eprint: [https://jamanetwork.com/journals/jamanetworkopen/articlepdf/2753527/parikh\_2019\_oi\_190606.pdf](https://jamanetwork.com/journals/jamanetworkopen/articlepdf/2753527/parikh_2019_oi_190606.pdf). URL: https://doi.org/10.1001/jamanetworkopen.2019.15997. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jamanetworkopen.2019.15997&link_type=DOI) 36. [36]. Van Tran, et al. “Helicobacter pylori (H. pylori) risk factor analysis and prevalence prediction: a machine learning-based approach”. In: BMC Infectious Diseases 22.1 (2022), p. 655. 37. [37]. Vicent Blanes-Selva, et al. “Responsive and Minimalist App Based on Explainable AI to Assess Palliative Care Needs during Bedside Consultations on Older Patients”. In: Sustainability 13.17 (2021), p. 9844. 38. [38]. Shayan Mostafaei, et al. “Machine learning algorithms for identifying predictive variables of mortality risk following dementia diagnosis: a longitudinal cohort study”. In: Scientific Reports 13.1 (2023), p. 9480. 39. [39]. Gary S Collins, et al. “Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) the TRIPOD statement”. In: Circulation 131.2 (2015), pp. 211–219. 40. [40]. Nicole L Pratt, et al. “The validity of the Rx-Risk comorbidity index using medicines mapped to the anatomical therapeutic chemical (ATC) classification system”. In: BMJ open 8.4 (2018), e021122. 41. [41].InterRAI Long-Term Care Facilities (LTCF) Assessment Form and User’s Manual, Australian Edition. 9.1.2. InterRAI, 2020. ISBN: 978-1-936065-60-8. URL: [https://catalog.interrai.org/content/interrai-long-term-care-facilities-ltcf-assessment-form-and-users-manual-australian-edition](https://catalog.interrai.org/content/interrai-long-term-care-facilities-ltcf-assessment-form-and-users-manual-australian-edition). 42. [42]. Catherine Travers et al. “Validation of the interRAI Cognitive Performance Scale against independent clinical diagnosis and the Mini-Mental State Examination in older hospitalized patients”. In: The journal of nutrition, health & aging 17 (2013), pp. 435–439. 43. [43]. Katherine Penny et al. “Convergent validity, concurrent validity, and diagnostic accuracy of the interRAI depression rating scale”. In: Journal of Geriatric Psychiatry and Neurology 29.6 (2016), pp. 361–368. 44. [44]. Jeff Poss, et al. “Development of the interRAI Pressure Ulcer Risk Scale (PURS) for use in long-term care and home care settings”. In: BMC geriatrics 10 (2010), pp. 1–10. 45. [45]. Trivellore E Raghunathan, et al. “A multivariate technique for multiply imputing missing values using a sequence of regression models”. In: Survey methodology 27.1 (2001), pp. 85–96. 46. [46]. Stef Van Buuren. “Multiple imputation of discrete and continuous data by fully conditional specification”. In: Statistical methods in medical research 16.3 (2007), pp. 219–242. 47. [47]. Lihong Qi et al. “Strategies for imputing missing covariates in accelerated failure time models”. In: Statistics in Medicine 37.24 (2018), pp. 3417–3436. DOI: 10.1002/sim.7809. eprint: [https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.7809](https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.7809). URL: [https://onlinelibrary.wileycom/doi/abs/10.1002/sim.7809](https://onlinelibrary.wileycom/doi/abs/10.1002/sim.7809). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.7809&link_type=DOI) 48. [48]. Maritza Mera-Gaona, et al. “Evaluating the impact of multivariate imputation by MICE in feature selection”. In: Plos one 16.7 (2021), e0254720. 49. [49]. Sebastian Pölsterl. “scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn”. In: Journal of Machine Learning Research 21.212 (2020), pp. 1–6. URL: [http://jmlr.org/papers/v21/20-729.html](http://jmlr.org/papers/v21/20-729.html). 50. [50].XGBoost Development Team. XGBoost: Extreme Gradient Boosting. [https://pypi.org/project/xgboost/](https://pypi.org/project/xgboost/). Version 1.7.6. Python Package. 2023. 51. [51]. Robert Tibshirani. “Regression Shrinkage and Selection via the Lasso”. In: Journal of the Royal Statistical Society: Series B (1996). 52. [52]. Arthur E. Hoerl and Robert W. Kennard. “Ridge Regression: Biased Estimation for Nonorthogonal Problems”. In: Technometrics (1970). 53. [53]. Hui Zou and Trevor Hastie. “Regularization and variable selection via the elastic net”. In: Journal of the Royal Statistical Society: Series B (2005). 54. [54]. Robert Tibshirani. “The lasso method for variable selection in the Cox model”. In: Statistics in Medicine (1997). 55. [55]. Jerome Friedman. “Greedy function approximation: A gradient boosting machine”. In: Annals of Statistics (2001). 56. [56]. Tianqi Chen and Carlos Guestrin. “XGBoost: A Scalable Tree Boosting System”. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016). 57. [57]. Leo Breiman. “Random Forests”. In: Machine Learning (2001). 58. [58]. Nirav Shah et al. “Clinical Analytics Prediction Engine (CAPE): Development, electronic health record integration and prospective validation of hospital mortality, 180-day mortality and 30-day readmission risk prediction models”. In: PLOS ONE 15.8 (Aug. 2020), pp. 1–15. DOI: 10.1371/journal.pone.0238065. URL: 10.1371/journal.pone.0238065. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0238065&link_type=DOI) 59. [59]. Alexandru Niculescu-Mizil and Rich Caruana. “Predicting Good Probabilities with Supervised Learning”. In: Proceedings of the 22nd International Conference on Machine Learning. ICML ‘05. Bonn, Germany: Association for Computing Machinery, 2005, pp. 625–632. ISBN: 1595931805. DOI: 10.1145/1102351.1102430. URL: 10.1145/1102351.1102430. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1145/1102351.1102430&link_type=DOI) 60. [60]. Sivaramakrishnan Rajaraman, Prasanth Ganesan, and Sameer Antani. “Deep learning model calibration for improving performance in class-imbalanced medical image classification tasks”. In: PLOS ONE 17.1 (Jan. 2022), pp. 1–23. DOI: 10.1371/journal.pone.0262838. URL: 10.1371/journal.pone.0262838. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0262838&link_type=DOI) 61. [61]. Frank E. Harrell, Kerry L. Lee, and Daniel B. Mark. “Evaluating the yield of medical tests”. In: Journal of the American Medical Association (1982). 62. [62]. Frank E. Harrell. “Regression Modeling Strategies”. In: Springer (2015). 63. [63]. James A. Hanley and Barbara J. McNeil. “The meaning and use of the area under a receiver operating characteristic (ROC) curve”. In: Radiology (1982). 64. [64]. Eva Graf et al. “Multistate survival models for panel data: The msm package for R”. In: Journal of Statistical Software (1999). 65. [65]. Christopher R. Manz et al. “Validation of a Machine Learning Algorithm to Predict 180-Day Mortality for Outpatients With Cancer”. In: JAMA Oncology 6.11 (Nov. 2020), pp. 1723–1730. ISSN: 2374-2437. DOI: 10.1001/jamaoncol.2020.4331. eprint: [https://jamanetwork.com/journals/jamaoncology/articlepdf/2770698/jamaoncology\_manz\_2020\_oi\_200068\_1604947261.21354.pdf](https://jamanetwork.com/journals/jamaoncology/articlepdf/2770698/jamaoncology\_manz\_2020_oi_200068_1604947261.21354.pdf).URL: doi:10.1001/jamaoncol.2020.4331. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jamaoncol.2020.4331&link_type=DOI) 66. [66]. David W Hosmer Jr., Stanley Lemeshow, and Rodney X Sturdivant. Applied Logistic Regression. 3rd ed. John Wiley & Sons, 2013. 67. [67]. Alex Gramegna and Paolo Giudici. “SHAP and LIME: An Evaluation of Discriminative Power in Credit Risk”. In: Frontiers in Artificial Intelligence (2021), p. 140. 68. [68]. Scott M. Lundberg and Su-In Lee. “A unified approach to interpreting model predictions”. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17. Curran Associates, 2017, pp. 4768–4777. 69. [69]. Ross Bicknell et al. “A study protocol for the development of a multivariable model predicting 6-and 12-month mortality for people with dementia living in residential aged care facilities (RACFs) in Australia”. In: Diagnostic and prognostic research 4 (2020), pp. 1–8. 70. [70]. Robin L Kruse, et al. “Using mortality risk scores for long-term prognosis of nursing home residents: caution is recommended”. In: Journals of Gerontology Series A: Biomedical Sciences and Medical Sciences 65.11 (2010), pp. 1235–1241. 71. [71]. Jenny T van der Steen et al. “Prediction of 6-month mortality in nursing home residents with advanced dementia: validity of a risk score”. In: Journal of the American Medical Directors Association 8.7 (2007), pp. 464–468. 72. [72].interRAI. *interRAI Palliative Care (PC) Assessment Form and User’s Manual*. 9.1.2. 130-page manual plus 8-page form and 6-page form. Standard English Edition: interRAI, 2023, p. 144. ISBN: 978-1-936065-28-8.