Abstract
Infant mortality is a reflection of an analysis of biological, socioeconomic, and assistance factors. An analytical analysis of this problem implies the processing of large sets of data from different areas. Data Science approaches have become increasingly widespread to deal with problems that require large datasets to perform deep analysis. Machine learning methods have become popular due to their efficiency and efficacy in discovering knowledge by identifying patterns in feature interactions of large datasets. This work proposes the use of a machine learning approach to evaluate the association between sociodemographic factors and preventable root causes of neonatal mortality. For this, demographic and epidemiological data from Brazilian public health birth and mortality (SINASC and SIM, respectively) information systems were used. Using an unsupervised approach, for instance, the K-Modes clustering algorithm, clusters were created, so we are able to evaluate the socio-demographic profile of each one of the clusters. In this way, it is possible to evaluate the differences between the profiles of each cluster. The profile consists of features such as maternal age, maternal years of schooling, race, number of consultations, type of delivery, public or private hospital, and date of first prenatal consultation. The analysis was performed using data from the period between 2012 and 2018, for the city of São Paulo, one of the richest regions of the country. The data quality for this region is considered to be very high, so there is no need to apply data correction methods. Besides that, the method adapted does not require data assumptions, and it’s suitable for categorical data, which is our case. Considering that this is a data-driven approach, preliminary results indicate that only a few assumptions can be made on profile using these features, although some associations between demographic variables and neonatal mortality by preventable root causes can be identified. We hope to encourage reflection on the newborn in the socioeconomic environment and contribute to public health policies.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This research was supported by Bill & Melinda Gates Foundation (#OPP1201970) and Ministry of Health of Brazil, through the National Council for Scientific and Technological Development (CNPq) (#443774/2018-8). It was also supported by NVIDIA, that donated a GPU XP Titan used by the research team.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethics committee/IRB of Brazilian Responsible Ethics Committee, Institution #5473 - Instituto Federal de Educaç ã o, Ciencia e Tecnologia de S ã o Paulo - IFSP, SUZANO/SP - Brazil, email: cep_ifsp{at}ifsp.edu.br gave ethical approval for this work under approval number #3.193.057. The approval can be validated at https://plataformabrasil.saude.gov.br using project title: "Plataforma de Apoio à Decis ã o para Políticas P ú blicas de Sa ú de Gestacional Baseada em T é cnicas de Visualizaç ã o de Informaç õ es e Aprendizado de M á quina".
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All data produced in the present study are available upon reasonable request to the authors.
https://doi.org/10.1016/j.dib.2020.106093