Quantifying the Uncertainty of Human Activity Recognition Using a Bayesian Machine Learning Method: A Prediction Study ====================================================================================================================== * Hiroshi Mamiya * Daniel Fuller ## Abstract **Background** Machine learning methods accurately predict physical activity outcomes using accelerometer data generated by wearable devices, thus allowing the investigation of the impact of built environment on population physical activity. While traditional machine learning methods do not provide prediction uncertainty, a new method, Bayesian Additive Regression Trees (BART) can quantify such uncertainty as posterior predictive distribution. We evaluated the performance of BART in predicting physical activity status. **Methods** We applied multinomial BART and the benchmark method, random forest, to accelerometer data in 25,424 time points, which were generated by wearable devices attached to 37 participants. We evaluated prediction accuracies and confusion matrix using leave-one-out cross-validation. **Results** BART and random forest demonstrated comparable accuracies in prediction. **Conclusions** BART is a relatively novel ML method and will advance the incorporation of predicted physical activity status into built environment research. Future research includes the evaluation of the association between the built environment and predicted physical activity with and without accounting for prediction uncertainty. ## Introduction The application of Machine Learning (ML) is expected to increase in Physical Activity (PA) epidemiology (1,2). ML enables the automated measurement of PA intensity or type (e.g., sitting, walking, or vigorous physical activity), where predictors are accelerometer signals generated by wearable devices such as smartphones. ML can capture complex non-linear functions of a large number of accelerometer variables and has consistently shown superior accuracy in predicting PA categories to traditionally used rule-based classification methods for PA categories (1, 3). The improved accuracy of predicting PA categories by ML will advance built environment and healthy cities research, which examines the etiologic association between the predicted PA outcome and daily exposure to the modifiable environmental drivers of PA, including walkability and green space (4). One of the overlooked limitations of utilizing ML-predicted variables in public health research is the lack of prediction uncertainty i.e., the lack of confidence interval around the predicted probability of PA status (5–7). Commonly used ML methods to predict PA types, such as random forest, do not provide uncertainty estimates but simply report the most likely PA type as if there is no measurement error. Treating ML-predicted variables without uncertainty leads to underestimated standard errors of the estimated etiologic association between built environmental characteristics and ML-predicted PA categories, which can lead to a falsely conclusive association i.e., higher type I error (6,8). Bayesian Additive Regression Trees (BART) is a relatively new ML method with growing applications in epidemiology (9,10). BART provides prediction uncertainty in the form of the probability distributions of each of PA activity types to be classified (11,12). Such predictive distribution can then be used to generate multiple predicted values through Monte Carlo sampling, which are then analyzed as if multiply imputed values within the second-stage analysis that estimates the impact of built environment on the predicted PA outcomes (6,13). Thus, BART will effectively allow propagating prediction uncertainty of PA into built environment research. BART has not been used in PA research to date. Thus, the objective of this study is to compare the accuracy of predicting categories representing PA intensity and types using BART and random forest. The latter is a commonly used ML method to predict PA using accelerometer data (1). ## Methods and data Our data consisted of accelerometer data containing n=25,424 time points with 5 seconds interval from 37 research participants (mean number of time points: 687, range: 664-723 points per participant). The participants performed the predefined sequences and length of the six combinations of PA types and intensities on a treadmill. These outcome categories are: lying down, sitting, self-paced walking, walking at 3 METs (Metabolic Equivalents of Task), 5 METs, and 7 METs, the latter three intensity categories aligning with the definition of light, moderate, and vigorous PA (14) (Supplementary Figure 1). This study was approved by the Memorial University Interdisciplinary Committee on Ethics in Human Research (ICEHR 20180188-EX). Accelerometer data were collected by Ethica Data app installed in Samsung Galaxy S7 (SM-G930W8) smartphone (15) placed in the right pants or shorts pocket of the participants. Ground truth data for activity types and intensity (i.e., METs) was measured using an Oxycon Pro metabolic cart (Oxycon Pro, Jaeger, Hochberg, Germany) (16). From accelerometer data, we computed 58 predictive features in accordance with the previous studies (17,18) at each 5-seconds time period. BART is a nonparametric sum-of-tree (additive) ensemble machine learning method, where the contribution of each decision tree to prediction is kept weak by regularization priors to prevent overfitting (12). The procedure to fit BART is provided in Appendix 1. We fitted multinomial BART using the BART package in R software (19). Fitting of the benchmark ML method, random forest, followed our previous papers using the caret package in R software (17,18). Codes and data are publicly available (20,21). The performance evaluation of BART and random forest followed leave-one-out-cross-validation, where algorithms were fitted to the data from 36 participants, followed by prediction where the fitted algorithms were applied to test data from the remaining one person. After repeating the process 37 times for all participants, we combined the predicted outcomes from the 37 test datasets to compute the metrics for model performance in relation to the ground truth. The evaluation metric is accuracy specific to each PA category, which is the sum of time points with true positive and true negative status (numerator) divided by all time points (denominator). We also created a confusion matrix and calibration plots. Unlike previous activity prediction research, we also computed accuracy across participants’ self-reported gender identity (female and male) for BART, given the potential heterogeneity of prediction accuracy and uncertainty across gender (22). ## Results Accuracy of the prediction by BART and random forest were similar (Table 1), and both showed the lowest accuracy for identifying self-paced walking. Confusion matrix (Figure 1) indicates that BART tended to misclassify laying from sitting and self-paced walking from walking at 3METs. Calibration plots also suggest lower goodness-of-fit for self-paced walking and walking at 3METs (Supplementary Figures 2 B and C). While accuracy and confusion matrix provide overall prediction error of BART and random forest in the entire test data, BART also informs about pointwise prediction uncertainty for each sample unit (time point). For example, results from one time point (Supplementary Figure 3) illustrate a largely overlapping posterior predictive probability distribution of self-paced walking and walking at 3 METs, indicating the challenge in classifying two PA categories. This is because participants chose an average self-paced walk intensity of 2.7 METS, making it challenging for the algorithms to distinguish between these intensities. This is also indicated by the corresponding MCMC traces at the same time point (Supplementary Figure 4). Accuracy across participants was highly variable, ranging from 47% to 94% (Supplementary Figure 5). There were noticeable variations in accuracy across genders, but both genders showed the lowest accuracy for self-paced walking (Table 2). View this table: [Table 1.](http://medrxiv.org/content/early/2023/08/22/2023.08.16.23294126/T1) Table 1. Comparison of accuracy between random forest and Bayesian Additive Regression Model View this table: [Table 2.](http://medrxiv.org/content/early/2023/08/22/2023.08.16.23294126/T2) Table 2. Comparison of accuracy between Male and Female participants by Bayesian Additive Regression Tree. ![Figure 1.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2023/08/22/2023.08.16.23294126/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2023/08/22/2023.08.16.23294126/F1) Figure 1. Confusion matrix indicating the posterior mean of the frequency of agreement between the reference (directly observed) and classified (predicted) activity types by Bayesian Additive Regression Tree. Numbers in non-diagonal elements indicate the estimated frequency of misclassification. Abbreviation: METs, Metabolic Equivalents of Task. ## Discussions The measurement of PA is likely to shift from traditional rule-based methods to ML-based methods due to the accuracy of the latter approach, thereby increasing the validity of built environmental research to examine the impact of built environment on population physical activity (2,23). However, the inability of traditionally used ML algorithms to quantify prediction uncertainty have received scarce attention to date (23,24), and the use of predicted variables in another analysis (e.g., built environment research) without incorporating prediction uncertainty can lead to falsely conclusive association (6). We demonstrated a novel approach to capture such uncertainty using BART, a Bayesian ML that, to our knowledge, has not been utilized in PA research. BART showed comparable classification accuracy to random forest, a widely used ML for the prediction of PA. However, both algorithms showed lower accuracies than reported in previous studies (1). The lower accuracies maybe due to the wide age variation of participants in our data that may have led to a larger variation of accelerometer noises (25,26). As well, many extant studies used split-sample or k-fold cross-validation that partition data from the same participants into training and test data, rather than using leave-one-out-cross-validation at the participant level, thus reporting inflated accuracy estimates due to leakage issue(27,28). Finally, our study used both PA intensity (various levels of METs) and type as categories for the outcome variable, while the majority of studies used only PA types (1). These make direct comparison of our results to previous research challenging and signifies the importance of standardized evaluation of ML algorithms on publicly accessible accelerometer data in the future (29). While our example is multi-category (multinomial) prediction, BART is readily applicable to predict binary and continuous outcome, for example continuous PA intensity measured by METs. BART allows measuring the uncertainty of the prediction of PA, as seen in the width and overlap of posterior predictive distributions across categories. This is unlike traditionally used ML algorithms that do not provide interval estimates or predictive distribution but only report one most likely PA type, as if the predicted status is true without potential prediction error. These probability distributions captured by BART readily propagate prediction uncertainty into the second-stage epidemiologic analysis incorporating the ML-predicted PA variables, in a manner similar to probabilistic bias correction or multiple imputation approach (30). BART also potentially overcomes solutions to capture uncertainty in conventional ML using bootstrapping (7,24), which is likely to be computationally expensive, as accelerometer data rapidly becomes large. As a limitation, the current implementations of BART show challenge in the convergence of the parameter estimation technique, Markov Chain Monte Carlo (MCMC), as the number of observations (hundreds of thousands) and predictors increases (9). Hence, we had to minimize the sample size by aggregating accelerometer signals into 5 second intervals. Since BART demonstrated a comparable accuracy with random forest, we plan to further assess its predictive performance in other publicly available datasets including accelerometer data generated from free-living (non-laboratory) conditions (29), followed by the epidemiologic investigation of association between the exposure to the built urban environment and the predicted PA variable, with and without accounting for uncertainty. Such work will not only promote the quality of evidence in the impact of urban environment, but also advance epidemiologic methodology in incorporating ML-predicted variables. Finally, we caution against the use of socio-demographic and economic status as the predictor, since these personal characteristics are often used as a confounder or effect measure modifier in the second-stage etiologic analysis. ## Supporting information Supplementary [[supplements/294126_file03.pdf]](pending:yes) ## Data Availability The de-identified accelerometer data are available publicly at: [https://doi.org/10.7910/DVN/LXVZRC](https://doi.org/10.7910/DVN/LXVZRC). [https://doi.org/10.7910/DVN/LXVZRC](https://doi.org/10.7910/DVN/LXVZRC) ## Declarations ### Ethics approval and consent to participate This study was approved by the Memorial University Interdisciplinary Committee on Ethics in Human Research (ICEHR 20180188-EX). ### Consent for publication Consent for publication was obtained from the study participants. ### Availability of data and materials Codes to prepare data and fit and evaluate machine learning algorithms are available publicly at the following address: [https://github.com/hiroshimamiya/BART\_PhysicalActivity](https://github.com/hiroshimamiya/BART_PhysicalActivity). The de-identified accelerometer data are available publicly at: [https://doi.org/10.7910/DVN/LXVZRC](https://doi.org/10.7910/DVN/LXVZRC). ### Competing interests None ### Funding The research was supported by the Postdoctoral fellowship from the Artificial Intelligence for Public Health (AI4PH) training platform, Canadian Institute for Health Research. The founder had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. ### Authors’ contributions Hiroshi Mamiya conceived the study, designed analytical protocol, and written the manuscript. Daniel Fuller provided inputs in analytical design, collected data, and critically reviewed the manuscript. ## Supplementary Digital Contents **Appendix 1:** Description of Bayesian Additive Regression Tree Bayesian Additive Regression Trees (BART) is a nonparametric ensemble machine learning algorithm utilizing a sum of regression trees. Rather than relying on a single complex prediction model, BART combines multiple regression trees that provide good predictive performance as a whole. The trees are kept simple (allowed to have only a few branches) to prevent over-fitting, so that optimal out-of-sample prediction performance is achieved. To model our multinomial outcome containing six physical activity categories, we used the BART package in R software (1). Specifically, the package uses probit model to estimate the conditional probability of a response category *j* as: ![Formula][1] for *j* = {1. . . *K* − 1}, *K* = 6 outcome categories at *i*-th time point in accelerometer data. The function Φ represents the standard normal cumulative density function, with μ*ij* representing intercept, and *f**j*(*x**i*) representing the sum-of-tree function for category *j*, whose predictive features calculated from accelerometer signals at *i-*th time points are denoted *x**i*. The general form of the ensemble BART function is represented as the sum of *m* trees as ![Formula][2] where *T**h*represents tree sizes and *M**h*represents the collection of leaf parameters as: ![Formula][3] for the total of *B**h*leaves on the *h*th regression tree. As BART is a fully-Bayesian model, prior probabilities for these parameters need to be specified. Prior on tree depth imposes weak learning (trees with shallow depth), which is accomplished by assigning the probability of a node *d*being non-terminal as: ![Formula][4] We used the recommended values of α = 0.95 and β = 2, such that the probability of a given tree having a complicated structure (large number of branches) is kept low (2). The centered prior for the *t-*th leaf parameter covers a large probability range, (Φ[−3.0], Φ[3.0]), and is specified as: ![Formula][5] with larger *k* and *m* inducing a stronger shrinkage of the leaf parameters towards zero. We used *k=2* and *m=50* tees for our model as suggested (1). To estimate the posterior distribution of parameter vector *y**ij*, we ran 2,000 burn-in iterations, followed by 2,000 samples for inference with thinning at every 100 iterations. From the three time-series of accelerometer signals that represent X, Y, and Z axis, we generated 58 predictive features described previously (3). Accuracy metric of multi-category (as opposed to binary) prediction represents the proportion of agreement between the predicted and observed (i.e., ground truth) categories. We computed this category-specific accuracy metric for each of the 2,000 MCMC interactions, which were averaged over as the posterior mean of class-specific accuracy. Note that the predicted category for each time point was determined as the category with the highest estimated predicted probability, that is, ![Graphic][6] For instance, the predicted category at *i*-th time point is ŷ*i* = *2* when category-specific probabilities at this time point are (0.1, 0.5, 0.1, 0.1, 0.1, 0.1). It is possible to use these probability outputs from BART to calculate accuracy e.g., Brier score. However, we converted these probabilities into the categorical measure for comparison purpose, as the latter measure is the standard output from random forest. ![Supplementary Figure 1.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2023/08/22/2023.08.16.23294126/F2.medium.gif) [Supplementary Figure 1.](http://medrxiv.org/content/early/2023/08/22/2023.08.16.23294126/F2) Supplementary Figure 1. Standardized activity protocol, illustrating the temporal transition of activity types and intensities performed by participants. Abbreviation: METs, metabolic equivalents of task. ![Supplementary Figure 2.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2023/08/22/2023.08.16.23294126/F3.medium.gif) [Supplementary Figure 2.](http://medrxiv.org/content/early/2023/08/22/2023.08.16.23294126/F3) Supplementary Figure 2. **A-F**: Calibration plots of Bayesian Additive Regression Trees (BART) showing the goodness-of-fit between the decile of predicted probabilities and the proportion of outcomes that are: A) lying, B) sitting, C) self-paced walking, D) running at 3 METs, E) running at 5 METs, and F) running at 7 METs. Translucent lines represent the iterations of Markov Chain Monte Carlo. The x-axis indicates the decile of predicted probability, and y axis indicates the proportion of the outcome corresponding to the deciles of predicted probabilities. Panel C and D shows a larger miscalibration (discordance) between the predicted and observed outcome probabilities, with BART underestimating the outcomes when predicting low probabilities, and overestimating when predicting high probabilities. On the other hand, BART generated predictive probabilities that closely match to the observed frequency when predicting Running 7 METs (Panel F). ![Supplementary Figure 3.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2023/08/22/2023.08.16.23294126/F4.medium.gif) [Supplementary Figure 3.](http://medrxiv.org/content/early/2023/08/22/2023.08.16.23294126/F4) Supplementary Figure 3. Posterior predictive distribution of six activity categories at one time point, where the estimated distributions of walking and Running (3METs) are largely overlapping. Abbreviation: METs, Metabolic Equivalents of Task. ![Supplementary Figure 4.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2023/08/22/2023.08.16.23294126/F5.medium.gif) [Supplementary Figure 4.](http://medrxiv.org/content/early/2023/08/22/2023.08.16.23294126/F5) Supplementary Figure 4. Trace plots of a time point corresponding to Supplementary Figure 3 above, indicating the mixing of Markov Chain for each of the six activity categories. ![Supplementary Figure 5.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2023/08/22/2023.08.16.23294126/F6.medium.gif) [Supplementary Figure 5.](http://medrxiv.org/content/early/2023/08/22/2023.08.16.23294126/F6) Supplementary Figure 5. Participant-specific accuracy by Bayesian Additive Regression Tree: Posterior mean and 95% credible interval. ## Acknowledgements We thank SeyedJavad Khataeipour for providing codes to fit and evaluate the random forest algorithm. ## Abbreviations BART : Bayesian Additive Regression Trees CI : Credible Interval ML : Machine Learning PA : Physical Activity * Received August 16, 2023. * Revision received August 16, 2023. * Accepted August 22, 2023. * © 2023, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. 1.Narayanan A, Desai F, Stewart T, Duncan S, Mackay L. Application of raw accelerometer data and machine-learning techniques to characterize human movement behavior: a systematic scoping review. Journal of Physical Activity and Health. 2020 Mar 1;17(3):360–83. 2. 2.Trost SG. Population-level physical activity surveillance in young people: are accelerometer-based measures ready for prime time? International Journal of Behavioral Nutrition and Physical Activity. 2020 Mar 18;17(1):28. 3. 3.Lee IM, Moore CC, Evenson KR. Maximizing the utility and comparability of accelerometer data from large-scale epidemiologic studies. J Meas Phys Behav. 2023;6(1):6–12. 4. 4.Kärmeniemi M, Lankila T, Ikäheimo T, Koivumaa-Honkanen H, Korpelainen R. The built environment as a determinant of physical activity: a systematic review of longitudinal studies and natural experiments. Annals of Behavioral Medicine. 2018 Feb 17;52(3):239–51. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/abm/kax043&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F08%2F22%2F2023.08.16.23294126.atom) 5. 5.Ananth CV, Brandt JS. Fetal growth and gestational age prediction by machine learning. The Lancet Digital Health. 2020 Jul 1;2(7):e336–7. 6. 6.Zhang T, Geng G, Liu Y, Chang HH. Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM2.5 Components. Atmosphere (Basel). 2020 Nov;11(11):1233. 7. 7.Kompa B, Snoek J, Beam AL. Second opinion needed: communicating uncertainty in medical machine learning. npj Digit Med. 2021 Jan 5;4(1):1–6. 8. 8.Chang HH, Peng RD, Dominici F. Estimating the acute health effects of coarse particulate matter accounting for exposure measurement error. Biostatistics. 2011 Oct;12(4):637–52. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biostatistics/kxr002&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21297159&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F08%2F22%2F2023.08.16.23294126.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000294806800004&link_type=ISI) 9. 9.Hill J, Linero A, Murray J. Bayesian additive regression trees: a review and look forward. Annual Review of Statistics and Its Application. 2020;7(1):251–78. 10. 10.Dorie V, Hill J, Shalit U, Scott M, Cervone D. Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition [Internet]. arXiv; 2018 [cited 2022 Dec 4]. Available from: [http://arxiv.org/abs/1707.02641](http://arxiv.org/abs/1707.02641) 11. 11.Logan BR, Sparapani R, McCulloch RE, Laud PW. Decision making and uncertainty quantification for individualized treatments using Bayesian Additive Regression Trees. Stat Methods Med Res. 2019 Apr 1;28(4):1079–93. 12. 12.Chipman HA, George EI, McCulloch RE. BART: Bayesian additive regression trees. Ann Appl Stat. 2010 Mar 1;4(1). 13. 13.Mesa-Frias M, Chalabi Z, Vanni T, Foss AM. Uncertainty in environmental health impact assessment: quantitative methods and perspectives. Int J Environ Health Res. 2013;23(1):16–30. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22515647&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F08%2F22%2F2023.08.16.23294126.atom) 14. 14.Ainsworth BE, Haskell WL, Herrmann SD, Meckes N, Bassett DRJ, Tudor-Locke C, et al. 2011 compendium of physical activities: a second update of codes and met values. Medicine & Science in Sports & Exercise. 2011 Aug;43(8):1575. 15. 15.Ethica. Ethica Data. [cited 2023 May 8]. Ethica Data. Available from: [https://ethicadata.com](https://ethicadata.com) 16. 16.Ismail M, Sana’a A, Loucks-Atlinson A, Atkinson M, Kelly L, Alkanani T, et al. Multiple propane gas flow rates procedure to determine accuracy and linearity of indirect calorimetry systems : An experimental assessment of a method. 2019 Feb 23; 17. 17.Khataeipour SJ, Anaraki JR, Bozorgi A, Rayner M, Basset FA, Fuller D. Predicting lying, sitting and walking at different intensities using smartphone accelerometers at three different wear locations: hands, pant pockets, backpack. BMJ Open Sport & Exercise Medicine. 2022 May 1;8(2):e001242. 18. 18.Fuller D, Anaraki JR, Simango B, Rayner M, Dorani F, Bozorgi A, et al. Predicting lying, sitting, walking and running using Apple Watch and Fitbit data. BMJ Open Sport & Exercise Medicine. 2021 Apr 1;7(1):e001004. 19. 19.Sparapani R, Spanbauer C, McCulloch R. Nonparametric machine learning and efficient computation with bayesian additive regression trees: the bart r package. Journal of Statistical Software. 2021 Jan 14;97:1–66. 20. 20.Mamiya H. Codes for BART -Physical Activity [Internet]. 2023 [cited 2023 Jun 16]. Available from: [https://github.com/hiroshimamiya/BART\_physicalActivity](https://github.com/hiroshimamiya/BART_physicalActivity) 21. 21.Fuller D, Mamyia H. Replication Data for: Application of Bayesian Additive Regression Tree to quantify the uncertainty of machine-learning derived variables: a case study in human activity patterns learned from accelerometer data [Internet]. Harvard Dataverse; 2023 [cited 2023 Jun 16]. Available from: [https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/LXVZRC](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/LXVZRC) 22. 22.Musa SB, Ellis R, Chafe B, Sturrock SL, Maher RA, Cullen K, et al. Wearable device validity in measuring steps, energy expenditure, and heart rate across age, gender, and body mass index: data analysis from a systematic review. Journal of Physical Activity and Health. 2022 Dec 19;1(aop):1–6. 23. 23.Ogburn EL, Rudolph KE, Morello-Frosch R, Khan A, Casey JA. A warning about using predicted values from regression models for epidemiologic inquiry. American Journal of Epidemiology. 2021 Jun 1;190(6):1142–7. 24. 24.Naimi AI, Platt RW, Larkin JC. Machine learning for fetal growth prediction. Epidemiology. 2018 Mar;29(2):290–8. 25. 25.Storti KL, Pettee KK, Brach JS, Talkowski JB, Richardson CR, Kriska AM. Gait speed and step-count monitor accuracy in community-dwelling older adults. Medicine and Science in Sports and Exercise. 2008;40(1):59–64. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1249/mss.0b013e318158b504&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18091020&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F08%2F22%2F2023.08.16.23294126.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000251870900009&link_type=ISI) 26. 26.Cyarto E V., Myers AM, Tudor-Locke C. Pedometer Accuracy in Nursing Home and Community-Dwelling Older Adults. Medicine and Science in Sports and Exercise. 2004;36(2):205–9. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1249/01.MSS.0000113476.62469.98&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=14767241&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F08%2F22%2F2023.08.16.23294126.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000188800600006&link_type=ISI) 27. 27.Dong Q. Leakage Prediction in Machine Learning Models When Using Data from Sports Wearable Sensors. Computational Intelligence and Neuroscience. 2022 May 17;2022:e5314671. 28. 28.Hannun A, Guo C, van der Maaten L. Measuring data leakage in machine-learning models with fisher information [Internet]. arXiv; 2021 [cited 2023 May 23]. Available from: [http://arxiv.org/abs/2102.11673](http://arxiv.org/abs/2102.11673) 29. 29.Fuller D, Ferber R, Stanley K. Why machine learning (ML) has failed physical activity research and how we can improve. BMJ Open Sport Exerc Med. 2022;8(1):e001259. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYm1qb3NlbSI7czo1OiJyZXNpZCI7czoxMToiOC8xL2UwMDEyNTkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMy8wOC8yMi8yMDIzLjA4LjE2LjIzMjk0MTI2LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 30. 30.Lash TL, Fox MP, MacLehose RF, Maldonado G, McCandless LC, Greenland S. Good practices for quantitative bias analysis. International Journal of Epidemiology. 2014 Dec 1;43(6):1969–85. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyu149&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25080530&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F08%2F22%2F2023.08.16.23294126.atom) [1]: /embed/graphic-4.gif [2]: /embed/graphic-5.gif [3]: /embed/graphic-6.gif [4]: /embed/graphic-7.gif [5]: /embed/graphic-8.gif [6]: /embed/inline-graphic-1.gif