Reducing Inequalities Using an Unbiased Machine Learning Approach to Identify Births with the Highest Risk of Preventable Neonatal Deaths
=========================================================================================================================================

* Antonio P. Ramos
* Fabio Caldieraro
* Marcus L. Nascimento
* Rafael Saldanha

## Abstract

**Background** Despite contemporaneous declines in neonatal mortality, recent studies show the existence of left-behind populations that continue to have higher mortality rates than the national averages. Additionally, many of these deaths are from preventable causes. This reality creates the need for more precise methods to identify high-risk births so that policymakers can more precisely target them. This study fills this gap by developing unbiased machine-learning approaches to more accurately identify births with a high risk of neonatal deaths from preventable causes.

**Methods** We link administrative databases from the Brazilian health ministry to obtain birth and death records in the country from 2015 to 2017. The final dataset comprises 8,797,968 births, of which 59,615 newborns died before reaching 28 days alive (neonatal deaths). These neonatal deaths are categorized into preventable deaths (42,290) and non-preventable deaths (17,325). Our analysis identifies the death risk of the former group, as they are amenable to policy interventions. We train six machine-learning algorithms, test their performance on unseen data, and evaluate them using a new policy-oriented metric. To avoid biased policy recommendations, we also investigate how our approach impacts disadvantaged populations.

**Results** XGBoost was the best performance algorithm for our task: the 5% births of the highest predicted risk from this model capture more than 85% of the actual deaths. Furthermore, the risk predictions exhibit no statistical differences in the proportion of actual preventable deaths from disadvantaged populations, defined by race, education, marital status, and maternal age. These results are similar for other thresh-old levels.

**Conclusions** We show that, by using publicly available administrative data sets and ML methods, it is possible to identify the births with the highest risk of preventable deaths with a high degree of accuracy. This is useful for policymakers as they can target health interventions to those who need them the most and where they can be effective without producing bias against disadvantaged populations. Overall, our approach can guide policymakers in reducing neonatal mortality rates and their health inequalities. Finally, it can be adapted to be used in other developing countries.

Keywords
*   Algorithmic bias
*   Health care
*   Health inequality
*   Machine learning
*   Neonatal mortality
*   Targeting program

## 1 Background

In recent years, many countries have achieved considerable progress in reducing early-life mortality (ELM), and many are in line to achieve the United Nations’ Sustainable Development Goals (SDGs). These reductions are important and associated with improved health outcomes1,2. However, health disparities remain high, even in countries in line to achieve the SDGs. These disparities may exist among ethnic groups, geographic regions, and levels of education, to mention a few subgroups 3,4. This is particularly concerning for deaths from preventable causes, where available interventions could be used5.

International agencies and local policymakers have recognized these disparities among subgroups. The most common approach to identifying high-risk groups has been stratifying mortality rates by subgroups, such as gender, socioeconomic status, and geographic location. While useful for some purposes, these approaches ignore within-group variability, whereas children from the same subgroup may have very different mortality rates. Recent studies showed that within-group variability is higher than between-group variability 6,7.

The decline in mortality rates makes it even more useful to adopt methods that can precisely identify those who still have a high risk of preventable deaths. This is particularly salient when only a fraction of the population can be given the needed intervention because of two factors. First, in many contexts in the developing world, resources are scarce. At the same time, at-risk individuals may demand considerable attention, thus the importance of not squandering resources with those who do not truly need them. Secondly, the smaller the population that can receive an intervention, the more difficult the task of correctly identifying those individuals that should be targeted. In this paper, we develop and explore a new approach. Using a large administrative data set with individual-level information about each birth, we employ machine learning models (ML) to estimate the risk of preventable neonatal death for new unseen births.

In doing so, we aim to aid local health professionals and policymakers identify which children need special attention, not based on preconceived risk factors. We develop a data-driven approach that combines several risk factors and provides digested information to healthcare providers or policymakers about those neonates who need more attention.

This is particularly useful, for example, in Brazilian regions where teams in the public health care system (SUS) may be responsible for 2000 to 3500 individuals8, as this identification might be very challenging due to the sheer number of patients under their care. The SUS is the world’s largest government-run public healthcare system by number of beneficiaries, land area coverage, and affiliated network with more than one million healthcare providers9. Based on our methodology, the use of an easy-to-use app 1 could assist healthcare teams on the ground in their targeting strategies by assigning a risk score for each neonate under their care.

Additionally, we apply a new metric to evaluate the performance of the developed machine learning models, which is appropriate to public health professionals and policymakers. Many ML algorithms have their performance judged by criteria such as specificity and accuracy or F1 metrics that are difficult to interpret for policy purposes. Our metric evaluates the usefulness of a given ML algorithm to identify high-risk births from preventable causes.

A critical feature of any life-saving intervention is that it can only save the lives of those who would have otherwise died. Even a “miracle drug” that can counteract any cause of death can only reduce mortality if given to children who, without it, would have died. Because of this, interventions that cannot be given universally must be carefully targeted to those at the highest risk of mortality (absent the intervention) to have an efficient effect. Our method addresses this issue by ranking births by their risk of preventable deaths.

This approach aligns with recent trends in medical and other fields regarding “evidence-based health policies”, where medical decisions, including clinical decisions, should be aided by scientific evidence. 10,11,12,13,14. It is also in line with recent trends in personalized medicine, which is becoming prominent in other fields of medicine and public health, as the risk assignments are estimated at the individual level15,16,17.

We also address concerns of bias in ML algorithms, given recent literature that shows the potential risk that the application of these methods can be more favorable to privileged populations18,19,20,21,22. Our models do not exhibit this behavior and capture similar proportions of preventable neonatal deaths from advantaged and less-advantaged populations.

## 2 Methods

### 2.1 Approach

In this research, the unit of analysis is individual birth. We aim to identify the births with the highest neonatal mortality risk from preventable causes. As such, we included all the available information from the administrative databases that contributed to improving the precision of our targeting.

### 2.2 Data sources

We use administrative databases from the Brazilian health ministry to obtain birth and death records in the entire country from 2015 to 2017 and information about health facilities, profes-sionals, and available equipment. All data is available at [https://datasus.saude.gov.br](https://datasus.saude.gov.br). Still, it is organized into three different health information systems: SINASC (Sistema de Informações sobre Nascidos Vivos), SIM (Sistema de Informação sobre Mortalidade), CNES (Cadastro Nacional de Estabelecimentos de Saúde), which we describe below.

SINASC includes all live births in the Brazilian territory, recording epidemiological and administrative information about the mothers and children. SIM, in turn, includes all deaths in the territory, containing epidemiological and administrative information and their circumstances. Fetal deaths are not considered as they are beyond the scope of this paper. Finally, CNES records a snapshot of Brazilian health facilities at a point in time. These systems contain three tables with all live births, deaths, and health facilities information.

To merge SIM and SINASC data, we used the field NUMERODN. It contains a unique number identifying each live birth. Records on SIM contain this information in cases of deaths within the first year since birth. Subsequently, we merged the information with CNES data by the CNES number, a unique identifier for health facilities in both the SINASC and CNES databases. The resulting raw dataset totals 8,829,944 records.

When merging the three databases, we identified and removed duplicated observations in the SIM and SINASC tables to avoid inconsistencies. With the deduplicated tables, deterministic linkages were executed.

In the raw dataset, a few additional treatments were performed. SIM records that were not linked to a SINASC record were not considered. Moreover, we did not consider a few residual records with no birthdate and records in which the difference between the birthdate and the date of death was negative. The resulting cleaned dataset comprises 8,797,968 births and 59,615 neonatal deaths.

### 2.3 Feature Engineering

Our set of features23 consists of the following variables: place of delivery, health facility type, maternal age at birth, sex, 1-min Apgar score, 5-min Apgar score, birth weight, gestational age, week of gestation, pregnancy type, delivery type, maternal education, presence of congenital anomaly, maternal ethnicity, antenatal visits, month of first antenatal visit, presentation type, induced labor, professional that assisted the labor, number of previous live births, number of previous fetal losses and abortions, number of previous pregnancies, number of previous vaginal deliveries, number of previous cesarean deliveries. In addition, we have also used marital status and state of birth (the definition and type of each of these features are in Table 1)

View this table:
[Table 1:](http://medrxiv.org/content/early/2024/01/13/2024.01.12.24301163/T1)

Table 1: Features information: administrative database, type, and description.

View this table:
[Table 2:](http://medrxiv.org/content/early/2024/01/13/2024.01.12.24301163/T2)

Table 2: Predictive performance for preventable neonatal mortality on the test set for each machine learning algorithm.

We analyze a nominal categorical target variable with three possible outcomes: alive, preventable death24, and non-preventable death. Among the non-preventable deaths, we have external causes of death and ill-defined deaths. The number of preventable deaths is 42,290, whereas the number of non-preventable deaths is 17,325.

To improve analysis efficiency, categorical variables were stored with codes. We did so by performing a relabeling procedure guided by the data dictionaries issued by the DataSUS, using the package *microdatasus* 25. We treated missing data via imputation and applied the package *Amelia* 26. Both packages are available in the R Statistical Software repository27.

As a pre-processing procedure, we centered and scaled the data, by subtracting the mean and dividing by the standard deviation. We also identified and excluded features with zero or near zero variance. Finally, we filtered out highly correlated features. Details are available upon request.

### 2.4 Modeling

The final dataset is partitioned into training and test sets: 7,038,375 observations (80.00% of the total) are used to train six different machine learning algorithms, while 1,759,593 observations (20.00% of the total) are used to evaluate the performance of our targeting criterion on new unseen data.

We estimated neonatal preventable infant mortality risk for each birth in the data set through flexible ML methods that use the above features. These methods were logistic regression, least absolute shrinkage and selection operator regression (LASSO), elastic-net regularized logistic regression (elastic net), random forest (RF), extreme gradient boosting over trees (XGBoost), and neural networks (NNs). We used the package *caret* available in the R Statistical Software27 to run the machine learning algorithms.

Logistic regression28 is the standard estimation of a linear model that estimates the parameters *β**j* for each feature *j* to maximize a logistic likelihood function by minimizing the negative log-likelihood. LASSO29 is essentially an implementation of linear regression that uses a ![Graphic][1]</img> norm penalty to regularize or “shrink” the model, preventing overfitting. It is similar to the logistic regression but includes a penalty term equal to ![Graphic][2]</img>, where the parameter *λ* is a non-negative real number that determines the strength of the regulation.

Elastic net30 combines *L*1 norm ![Graphic][3]</img> and ![Graphic][4]</img> norm penalties to regularize the model. It minimizes the negative log-likelihood plus a penalty term equals to ![Graphic][5]</img>, where the parameters *α* and *λ* are defined on the unit interval and on the non-negative real numbers respectively. As particular cases, elastic net comprises LASSO regression (*α* = 1) and logistic regression (*λ* = 0).

Our application first tested a cross-validation procedure to choose the parameters *α* and *λ* in elastic net and the parameter *λ* in LASSO. However, their performances were not close to the logistic regression. For that purpose, we fixed *α* = 0.5 and *λ* = 0.001 in elastic net, and *λ* = 0.001 in LASSO. The method *glmnet* was used for all three algorithms.

The methods RF31 and XGBoost32 are tree-based algorithms. The simplest tree-based algorithms are classification and regression trees (CART33). Both single-tree models recursively group the outcome observations with similar values using cutoff values of the features. Although single-tree models are easy to interpret, their performance is frequently poor and very sensitive to small changes in the input data. By combining several trees, RF and XGBoost methods improve single-tree algorithm performance. The former averages the estimates of a set of trees, each obtained from a random subset of features and trained on a random subset of the observations. The latter also combines several trees, but it initiates with one tree, and new trees are iteratively trained on the errors of the prior set of trees.

As applied by us, in RF, the *ranger* method was employed, and (i) each forest encompasses 500 trees, (ii) the number of variables randomly sampled for each tree split (mtry) was set to 5 (the square root of the number of features), (iii) the minimal node size (min.node.size) was set to 1, and (iv) we choose the gini index as splitting rule (splitrule). In XGBoost, the *xgbTree* method was employed, and (i) the number of iterations for the boosting procedure (nrounds) was set to 250, (ii) the learning rate (*η ∈* (0, 1) was set to 0.3 to prevent over-fitting, (iii) the maximum depth of the trees (max_depth) was set to 4, (iv) the proportion of the variables to be considered for tree construction (colsample_bytree) was set to the interval (0.6, 1), and the proportion of observations from the training set used for modeling (subsample) was set to the interval (0.5, 1).

The NN methods34 is constituted by an output layer and node layers, including an input layer and one or more hidden layers. The input layer takes the features, and no processing is done. All kinds of processing are executed on the hidden layers and transferred to the output layer. The output layer, in turn, is the final layer, bringing the final value resultant from the learning process in the hidden layers. The nodes, also known as artificial neurons, are connected, and these associations are characterized by their weights, thresholds, and activation functions. Nodes are activated, and data are sent to the next network layer when their outputs exceed a specified threshold value. Otherwise, no data is transmitted to the next layer.

Although we tested specifications with more than one hidden layer using the *mlpML* method, they performed similarly to the neural network with only one hidden layer. Thus, our application employed the *mlp* method, specifying a layer with 25 nodes.

### 2.5 Performance Metrics

For our task, we did not find it useful to adopt traditional prediction performance metrics, such as classification accuracy, confusion matrices, specificity/sensitivity statistics, or precision/recall statistics, all of which require a threshold for deciding when a risk score is high enough to merit a warning. These can be misleading when applied to rare outcomes, as in the problem we focus on. In our case, if we predicted no neonatal mortality, that model would be right 99% of the time, yet it would be useless, as it wouldn’t allow us to identify those who could be targeted. We neither find it useful to adopt “threshold-free” approaches that report accuracy in a way that does not depend on choosing one threshold, such as ROC-AUC and F-scores do, because they are difficult to give any valuable policy meaning in our context.

We instead recognize that if one has a resource constraint- only a certain fraction of cases one can act on- it gives a reason to compute the proportion of deaths captured by setting the threshold levels of the highest predicted mortality risk. For example, suppose we imagine that a policymaker can only provide intervention to the 5% (or 10%) who need it the most. In this case, the threshold can be set to whatever fraction of high-risk births they have resources for targeting. An appropriate approach, therefore, can concentrate a substantial amount of neonatal deaths in small percentages of high-risk individuals.

### 2.6 Algorithmic Bias

Algorithmic bias 18,19,20,21,22 is a well-documented problem with striking implications for health care and public policy. Therefore, besides concentrating a substantial amount of neonatal preventable deaths in small percentages of high-risk individuals, our targeting criterion should also be able not to disadvantage the most vulnerable groups.

To check whether our preferred model would not disadvantage the most vulnerable populations, we checked its performance for four different sub-groups identified using the demographic variables in our dataset. These sub-groups are newborns from *non-white mothers,low-education mothers, underage mothers*, and *single mothers*. We use the test sample as a reference and compare its composition with individuals with the highest predicted risk of neonatal preventable death for different threshold levels. For that, we construct confidence intervals based on the Normal approximation for the mortality rate of each group and check whether these intervals contain their respective mortality rates in the test sample. We also perform hypothesis tests to verify whether the proportions of preventable deaths captured by the algorithm ![Graphic][6]</img> are statistically equal to the proportion of preventable deaths in the test sample (*p*). The null hypothesis is ![Graphic][7]</img> and the alternative hypothesis, ![Graphic][8]</img>

## 3 Results

Recall that we calculate our performance metric by setting the highest predicted mortality risk threshold levels and considering the percentage of neonatal preventable deaths in the test sample for each threshold level. Figure 1 summarizes the results.

![Figure 1:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/01/13/2024.01.12.24301163/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2024/01/13/2024.01.12.24301163/F1)

Figure 1: 
Performance of the different ML methods.

Our best model in terms of predictive performance is the XGBoost method. With that algorithm, in our test sample, including the 5% highest risk births, our model captures 85% of preventable neonatal deaths. The XGBoost is never worse than other competing methods and thus is selected as our preferred model.

To check whether our preferred model would not disadvantage the most vulnerable populations, we checked its performance for the four different sub-groups of newborns from disadvantaged populations as presented in the Algorithmic Bias subsection.

In the first analysis, we compared the percentage of disadvantaged individuals selected as high-risk versus the proportion of disadvantaged individuals in our test sample for each sub-group. Figure 2 reports these results.

![Figure 2:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/01/13/2024.01.12.24301163/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2024/01/13/2024.01.12.24301163/F2)

Figure 2: 
Proportion of individuals selected by the model. Note: In each graph, the horizontal lines depict the proportion of each ELM in the test set. The data points mark the proportion of actual ELM captured by our model and their confidence intervals.

The figure demonstrates that our algorithm selects a significantly higher proportion of individuals from the disadvantaged sub-groups to be high risk for nearly all threshold percentages of the highest predicted risk. Only at the highest percentage thresholds, when the algorithm selects nearly the entire test set, does the proportion of disadvantaged sub-groups converge to the actual proportion in the test set. This means that the proportion of disadvantaged individuals selected by the algorithm is higher than the overall proportion in the test set.

One wonders whether selecting these individuals would reflect a distortion in the number of preventable deaths captured by the algorithm. Figure 3 depicts the analysis of preventable deaths identified per subgroup. The analysis demonstrates that there are no statistical differences in the proportion of actual preventable deaths from disadvantaged populations that would be included in the selected at-risk births. Therefore, our preferred model is not biased against or favoring underserved groups. On the contrary, using our algorithm would provide a fair inclusion for each population in terms of actual preventable deaths.

![Figure 3:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/01/13/2024.01.12.24301163/F3.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2024/01/13/2024.01.12.24301163/F3)

Figure 3: 
Proportion of actual ELM captured by the model. Note: In each graph, the horizontal lines depict the proportion of each ELM in the test set. The data points mark the proportion of actual ELM captured by our model and their confidence intervals.

![Figure 4:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/01/13/2024.01.12.24301163/F4.medium.gif)

[Figure 4:](http://medrxiv.org/content/early/2024/01/13/2024.01.12.24301163/F4)

Figure 4: 
XGBoost feature importance.

To better highlight these results, Table 3 presents the outcomes of the hypothesis tests at a 95% confidence level. The null hypothesis is not rejected for all thresholds and sub-groups depicted in the table. It corroborates that using our algorithm would provide a fair inclusion for each population in terms of actual preventable deaths.

View this table:
[Table 3:](http://medrxiv.org/content/early/2024/01/13/2024.01.12.24301163/T3)

Table 3: Tests for proportions at a confidence level of 95%: comparing the proportion of preventable deaths captured by the algorithm and in the test sample.

Taken together, these results show that our algorithm selects a higher proportion of individuals from the less-advantaged populations as high-risk births to provide a statistically equal proportion of births with preventable deaths for both disadvantaged and privileged sub-groups.

## 4 Discussion

Our research puts forward a new analytic approach that integrates large administrative datasets and pairs them with ML models to enable the identification, with a high degree of accuracy, of births with the highest risk of preventable deaths. Furthermore, the approach properly selects a statistically equal proportion of births with preventable deaths for disadvantaged and privileged populations, muffling concerns of algorithmic bias.

We can conceive our approach as a calculator that produces a risk score for each birth. These scores can aid teams of medical professionals to more precisely direct resources to those who have the highest risk of preventable death. As we discussed in the Introduction, doctors and healthcare professionals are often overwhelmed by the number of patients they have to take care of, and this system can be useful to aid them when making decisions about allocating scarce resources.

Our approach is not a replacement for health care professionals, who have subject matter expertise that should not be ignored. Instead, we are offering one additional tool for them. This tool can be particularly useful in a situation such as that in the Brazilian public health systems SUS, in which healthcare teams are responsible for a large number of patients.

Our recommendations are related to a large body of literature in medicine and public health that develops risk scores for individuals to identify those at risk of some event. These scores have been applied to a variety of outcomes35,36, and our results suggest the possible usefulness of such scores for the identification of neonates with a high risk of preventable deaths. In addition, we also address a common concern: algorithmic biases18,19,20,21,22. Our approach is not biased in favor of more privileged groups. Instead, it classifies a higher proportion of individuals from less privileged populations as high-risk births to reach a statistically equal proportion of preventable deaths from more and less privileged groups, alleviating concerns of negative bias in ML algorithms.

It is common to use poverty status as a proxy for at-risk births37. Our findings should not be interpreted as recommending against this practice. On the contrary, we implicitly include poverty in our analysis by considering several risk factors that correlate with it.

Our study presumes that healthcare providers can make the right intervention to save at-risk newborns. That is the reason behind ranking the risk of “preventable deaths.” Of course, for this to become true, it depends on the actual capacity of the policymaker or healthcare provider to intervene in preventable death cases correctly.

## 5 Conclusion

Our approach can guide Brazilian policymakers in reducing neonatal mortality rates and health to intervene in preventable death cases correctly and unbiasedly. The methods and metrics developed in this paper have broader applicability and are flexible enough to apply to several scenarios in other developing countries. For example, some countries with incomplete vital registration systems could use surveys like the Health and Demographic Surveys (DHS) instead of administrative data. The risk factors inclusion may also vary across countries, given data availability and political and public health considerations.

## Data Availability

The data is publicly available. The code used in this analysis will be posted in a public repository.

## Declarations

## Ethics approval and consent to participate

Not applicable.

## Consent for publication

Not applicable.

## Availability of data and materials

The data is publicly available. The code used in this analysis will be posted in a public repository.

## Competing interests

The authors declare that they have no competing interests.

## Funding

The Getúlio Vargas Foundation partially supported this work, award number PAR-004.037.019.00009.

## Authors’ contributions

APR and FC designed the study and the statistical analysis and wrote the manuscript. MLN performed the statistical analysis and wrote the manuscript. RS assembled the dataset. All authors approved the final manuscript.

## A List of included features

## B Feature importance

## C Predictive performance

## D Hypothesis tests

## Acknowledgements

The authors thank Chad Hazlett, PhD, for his comments on an early version of this paper, and Isabella Grion for her help in editing the paper.

## Footnotes

*   * tomramos{at}ucla.edu

*   † fabio.caldieraro{at}fgv.br

*   1 We developed an example of such an app and made it available on the Internet at [https://64o4b7-marcus0l0nascimento.shinyapps.io/tent\_app/](https://64o4b7-marcus0l0nascimento.shinyapps.io/tent_app/)

*   Received January 12, 2024.
*   Revision received January 12, 2024.
*   Accepted January 13, 2024.


*   © 2024, Posted by Cold Spring Harbor Laboratory

The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission.

## References

1.  Johnson RC, Schoeni RF. The Influence of Early-Life Events on Human Capital, Health Status, and Labor Market Outcomes Over the Life Course. The BE Journal of Economic Analysis & Policy. 2011;11(3).
    
    
2.  Currie J, Vogl T. Early-Life Health and Adult Circumstance in Developing Countries. Annual Review of Economics. 2013;5:1–36.
    
    
3.  Smith LK, Manktelow BN, Draper ES, Springett A, Field DJ. Nature of socioeconomic inequalities in neonatal mortality: population based study. BMJ. 2010;341:c6654.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE3OiIzNDEvZGVjMDJfMS9jNjY1NCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzAxLzEzLzIwMjQuMDEuMTIuMjQzMDExNjMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

4.  Dyer L, Theall KP, Wallace M. Structural racism, racial inequities and urban–rural differences in infant mortality in the US. Journal of Epidemiology & Community Health. 2020;75(8):788–793.
    
    
5.  Bhutta ZA, Das JK, Bahl R, Lawn JE, Salam RA, Paul VK, et al. Can available interventions end preventable deaths in mothers, newborn babies, and stillbirths, and at what cost? Lancet. 2014;384(9940):347–370.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(14)60792-3&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24853604&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F01%2F13%2F2024.01.12.24301163.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000339722400032&link_type=ISI) 

6.  Ramos AP, Weiss RE. Measuring Within and Between Group Inequality in Early-Life Mortality Over Time: A Bayesian Approach with Application to India. arXiv preprint arXiv:180408570. 2019;.
    
    
7.  Ramos AP, Flores MJ, Weiss RE. Leave no child behind: Using data from 1.7 million children from 67 developing countries to measure inequality within and between groups of births and to identify left behind populations. PLOS ONE. 2020;15(10):e0238847.
    
    
8.  Ministério da Saúde. Portaria nº 2.436; 2017-09-21. Diário Oficial da União.
    
    
9.  Castro MC, Massuda A, Almeida G, Menezes-Filho NA, Andrade MV, de Souza Noronha KVM, et al. Brazil’s unified health system: the first 30 years and prospects for the future. The Lancet. 2019;395(10195):P345–356.
    
    
10. Sackett DL, Rosenberg WMC, Gray JAM, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn’t. BMJ. 1996;312(7023):71–72.
    
    [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjExOiIzMTIvNzAyMy83MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzAxLzEzLzIwMjQuMDEuMTIuMjQzMDExNjMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

11. Sackett DL. Evidence-based medicine. Seminars in perinatology. 1997;21(1):3–5.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0146-0005(97)80013-4&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9190027&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F01%2F13%2F2024.01.12.24301163.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1997XD05100003&link_type=ISI) 

12. Giacomini M. Theory-Based Medicine and the Role of Evidence: Why the Emperor Needs New Clothes, Again. Perspectives in Biology and Medicine. 2009;52(2):234–251.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1353/pbm.0.0088&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19395822&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F01%2F13%2F2024.01.12.24301163.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000265587200007&link_type=ISI) 

13. 1.  Gifford F
    
    Bluhm R, Borgerson K. Evidence-Based Medicine. In: Gifford F, editor. Philosophy of Medicine. Elsevier; 2011. p. 203–238.
    
    
14. Djulbegovic B, Guyatt GH. Progress in evidence-based medicine: a quarter century on. The Lancet. 2017;390(10092):415–423.
    
    
15. Hamburg MA, Collins FS. The Path to Personalized Medicine. New England Journal of Medicine. 2010;363:301–304.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMp1006304&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20551152&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F01%2F13%2F2024.01.12.24301163.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000280139300001&link_type=ISI) 

16. Hayes DF, Markus HS, Leslie RD, Topol EJ. Personalized medicine: risk prediction, targeted therapies and mobile health technology. BMC Medicine. 2014;125(37).
    
    
17. Hoeyer K. Data as promise: Reconfiguring Danish public health through personalized medicine. Social Studies of Science. 2019;49(4):531–555.
    
    
18. Panch T, Mattie H, Atun R. Artificial intelligence and algorithmic bias: implications for health systems. Journal of Global Health. 2019;9(2):020318.
    
    
19. Obermeyer Z, Vogeli BPC, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;60(6464):447–453.
    
    
20. Kakani P, Chandra A, Mullainathan S, Obermeyer Z. Allocation of COVID-19 Relief Funding to Disproportionately Black Counties. JAMA. 2020;324(10):1000–1003.
    
    
21. Akter S, McCarthy G, Sajib S, Michael K, Dwivedi YK, D’Ambra J, et al. Algorithmic bias in data-driven innovation in the age of AI. International Journal of Information Management. 2021;60:102387.
    
    
22. Mullainathan S, Obermeyer Z. Diagnosing Physician Error: A Machine Learning Approach to Low-Value Health Care. Quarterly Journal of Economics. 2022;137(2):679–727.
    
    
23. Batista AFM, Diniz CSG, Bonilha EA, Kawachi I, Filho ADPC. Neonatal mortality prediction with routinely collected data: a machine learning approach. BMC Pediatrics. 2021;21(322).
    
    
24. Malta DC, Duarte EC, de Almeida MF, de Salles Dias MA, de Morais Neto OL, de Moura L, et al. Lista de causas de mortes evitáveis por intervenções do Sistema Único de Saúde do Brasil. Epidemiologia e Serviços de Saúde. 2007;16(4):233–244.
    
    
25. de Freitas Saldanha R, Bastos RR, Barcellos C. Microdatasus: pacote para download e pré-processamento de microdados do Departamento de Informática do SUS (DATASUS). Cadernos de Saúde Pública. 2019;35(9):1–9.
    
    
26. Honaker J, King G, Blackwell M. Amelia II: A Program for Missing Data. Journal of Statistical Software. 2011;45(7):1–47.
    
    
27. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2013.
    
    
28. Kleinbaum DG, Klein M. Logistic Regression. 1st ed. New York: Springer; 2010.
    
    
29. Tibshirani R. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society, Series B. 1996;58(1):267–288.
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1996TU31400017&link_type=ISI) 

30. Zou H, Hastie T. Regularization and variable selection via the Elastic Net. Journal of the Royal Statistical Society, Series B. 2005;67:301–320.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1467-9868.2005.00503.x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=WOS:00022749&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F01%2F13%2F2024.01.12.24301163.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000227498200007&link_type=ISI) 

31. Breiman L. Random Forests. Machine Learning. 2001;45:5–32.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1023/A:1010933404324/METRICS&link_type=DOI) 

32. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ‘16. New York, NY, USA: ACM; 2016. p. 785–794. Available from: [http://doi.acm.org/10.1145/2939672.2939785](http://doi.acm.org/10.1145/2939672.2939785).
    
    
33. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Monterey, CA: Wadsworth and Brooks; 1984.
    
    
34. Aggarwal CC. Neural Networks and Deep Learning: A Textbook. 1st ed. Cham: Springer; 2018.
    
    
35. Miao C, Bao M, Xing A, Chen S, Wu Y, Cai J, et al. Cardiovascular Health Score and the Risk of Cardiovascular Diseases. PLOS ONE. 2015;10(7):e0131537.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0131537&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26154254&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F01%2F13%2F2024.01.12.24301163.atom) 

36. Li R, Chen Y, Ritchie MD, Moore JH. Electronic health records and polygenic risk scores for predicting disease risk. Nature Reviews Genetics. 2020;21:493–502.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41576-020-0224-1&link_type=DOI) 

37. Szwarcwald CL, de Andrade CLT, Bastos FI. Income inequality, residential poverty clustering and infant mortality: a study in Rio de Janeiro, Brazil. Social Science & Medicine. 2002;55(12):2083–2092.

 [1]: /embed/inline-graphic-1.gif
 [2]: /embed/inline-graphic-2.gif
 [3]: /embed/inline-graphic-3.gif
 [4]: /embed/inline-graphic-4.gif
 [5]: /embed/inline-graphic-5.gif
 [6]: /embed/inline-graphic-6.gif
 [7]: /embed/inline-graphic-7.gif
 [8]: /embed/inline-graphic-8.gif