Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Antibody selection strategies and their impact in the analysis of malaria multi-sera data

View ORCID ProfileAndré Fonseca, Mikolaj Spytek, View ORCID ProfilePrzemyslaw Biecek, View ORCID ProfileClara Cordeiro, View ORCID ProfileNuno Sepúlveda
doi: https://doi.org/10.1101/2022.10.06.22280719
André Fonseca
1Faculdade de Ciências e Tecnologia, Universidade do Algarve, Faro, Portugal
2CEAUL – Centro de Estatística e Aplicações, Faculdade de Ciências, Universidade de Lisboa, Portugal
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for André Fonseca
Mikolaj Spytek
3Faculty of Mathematics & Information Science, Warsaw University of Technology, Warsaw, Poland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Przemyslaw Biecek
3Faculty of Mathematics & Information Science, Warsaw University of Technology, Warsaw, Poland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Przemyslaw Biecek
Clara Cordeiro
1Faculdade de Ciências e Tecnologia, Universidade do Algarve, Faro, Portugal
2CEAUL – Centro de Estatística e Aplicações, Faculdade de Ciências, Universidade de Lisboa, Portugal
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Clara Cordeiro
Nuno Sepúlveda
2CEAUL – Centro de Estatística e Aplicações, Faculdade de Ciências, Universidade de Lisboa, Portugal
3Faculty of Mathematics & Information Science, Warsaw University of Technology, Warsaw, Poland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nuno Sepúlveda
  • For correspondence: N.Sepulveda{at}mini.pw.edu.pl
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Background Nowadays, the chance of discovering the best antibody candidates for explaining naturally acquired protection to malaria and detecting exposure to malaria parasites has notably increased due to publicly available multi-sera data. The analysis of these data is typically divided into a feature selection phase followed by a predictive one where several models are constructed for the outcome of interest. A key question in the analysis is to determine which and how each feature should be included in the predictive stage.

Results To answer this question, we developed three approaches for classifying malaria protected and susceptible groups: (i) a basic and simple approach based on selecting antibodies via the nonparametric Mann-Whitney test; (ii) a dichotomization approach where each antibody was selected according to the optimal cut-off via maximization of the χ2 statistic for two-way tables; (iii) a hybrid parametric/non-parametric approach that integrates Box-Cox transformation followed by a t-test, together with the use of finite mixture models and the Mann-Whitney test as a last resort. We illustrated the application of these three approaches with published serological data for predicting clinical malaria in 121 Kenyan children. The predictive analysis was based on a Super-Learner where predictions from multiple classifiers were pooled together. Our results led to almost similar areas under the Receiver Operating Characteristic curves of 0.72 (95% CI = [0.61, 0.82]), 0.80 (95% CI = [0.71, 0.90]), 0.79 (95% CI = [0.7, 0.88]) for the simple, dichotomization and hybrid approaches, respectively.

Conclusions The three feature selection strategies provided a better predictive performance of the outcome when compared to the previous results solely relying on Random Forests alone (AUC=0.68). Given the similar predictive performance, we recommended the three strategies should be used in conjunction in the same data set and selected according to their complexity.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

Andre Fonseca has a PhD fellowship by FCT, Fundacao para a Ciencia e Tecnologia (ref. SFRH/BD/147629/2019). Clara Cordeiro and Nuno Sepulveda are partially financed by national funds through FCT under the project UIDB/00006/2020. Nuno Sepulveda is funded by Polish National Agency for Academic Exchange (ref. grant: PPN/ULM/2020/1/00069/U/00001).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

  • In this version, we made major revisions that included restructuring the overall analysis, making the objective of the paper clearer and reframing the main ideas in a more intuitive way. Major changes included dividing the overall analysis into a feature selection step and a predictive analysis step. This way of framing the analysis provided better reasoning for the use of different approaches to select the antibodies most associated with malaria disease. Here, a new approach consisting of the use of the chi-squared test to assess associations between antibody values and protection against malaria was added as a baseline model. A preliminary approach based on the Random Forest was also added to validate previously published results. Finally, a SuperLearner was adopted for conducting the predictive analysis. In opposition to the use of different separate models, the use of a SuperLearner is less time-consuming and allows to obtain more accurate results.

Data Availability

All data produced are available online at:https://doi.org/10.1371/journal.pcbi.1005812

https://doi.org/10.1371/journal.pcbi.1005812

  • Abbreviation List

    AIC
    Akaike’s Information Criterion
    Ama
    Apical membrane antigen 1
    AUC
    Area Under de Curve
    EBA
    Erythrocyte-binding antigen
    ELISA
    Enzyme-linked immunosorbent assay
    FDR
    False discovery rate
    GOF
    Goodness of fitness
    igG
    Immunoglobulin G
    LASSO
    Least Absolute Shrinkage and Selection Operator
    LDA
    linear discriminant analysis
    Log
    logarithmic
    LRM
    Logistic regression model
    MSP
    Merozoite Surface Protein
    MSRP
    MSP7-related proteins
    np
    Number of Protected individuals
    ns
    Number of Susceptible individuals
    Pf
    Plasmodium falciparum
    Prt
    Protected
    QDA
    Quadratic discriminant analysis
    RF
    Random Forest
    ROC
    Receiver Operating Characteristic
    rS
    Spearman Correlation Coefficient
    SeroTAT
    serological testing and treatment
    SL
    super learner
    sPLS-DA
    Sparse partial least squares discriminant analysis
    Sus
    Susceptible
    SVM
    Support vector machine
    SW
    Shapiro-Wilk
    χ2
    Chi-square
    XGB
    Extreme Gradient Boosting
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
    Back to top
    PreviousNext
    Posted September 05, 2023.
    Download PDF
    Data/Code
    Email

    Thank you for your interest in spreading the word about medRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Antibody selection strategies and their impact in the analysis of malaria multi-sera data
    (Your Name) has forwarded a page to you from medRxiv
    (Your Name) thought you would like to see this page from the medRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Antibody selection strategies and their impact in the analysis of malaria multi-sera data
    André Fonseca, Mikolaj Spytek, Przemyslaw Biecek, Clara Cordeiro, Nuno Sepúlveda
    medRxiv 2022.10.06.22280719; doi: https://doi.org/10.1101/2022.10.06.22280719
    Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    Antibody selection strategies and their impact in the analysis of malaria multi-sera data
    André Fonseca, Mikolaj Spytek, Przemyslaw Biecek, Clara Cordeiro, Nuno Sepúlveda
    medRxiv 2022.10.06.22280719; doi: https://doi.org/10.1101/2022.10.06.22280719

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Infectious Diseases (except HIV/AIDS)
    Subject Areas
    All Articles
    • Addiction Medicine (349)
    • Allergy and Immunology (668)
    • Allergy and Immunology (668)
    • Anesthesia (181)
    • Cardiovascular Medicine (2648)
    • Dentistry and Oral Medicine (316)
    • Dermatology (223)
    • Emergency Medicine (399)
    • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
    • Epidemiology (12228)
    • Forensic Medicine (10)
    • Gastroenterology (759)
    • Genetic and Genomic Medicine (4103)
    • Geriatric Medicine (387)
    • Health Economics (680)
    • Health Informatics (2657)
    • Health Policy (1005)
    • Health Systems and Quality Improvement (985)
    • Hematology (363)
    • HIV/AIDS (851)
    • Infectious Diseases (except HIV/AIDS) (13695)
    • Intensive Care and Critical Care Medicine (797)
    • Medical Education (399)
    • Medical Ethics (109)
    • Nephrology (436)
    • Neurology (3882)
    • Nursing (209)
    • Nutrition (577)
    • Obstetrics and Gynecology (739)
    • Occupational and Environmental Health (695)
    • Oncology (2030)
    • Ophthalmology (585)
    • Orthopedics (240)
    • Otolaryngology (306)
    • Pain Medicine (250)
    • Palliative Medicine (75)
    • Pathology (473)
    • Pediatrics (1115)
    • Pharmacology and Therapeutics (466)
    • Primary Care Research (452)
    • Psychiatry and Clinical Psychology (3432)
    • Public and Global Health (6527)
    • Radiology and Imaging (1403)
    • Rehabilitation Medicine and Physical Therapy (814)
    • Respiratory Medicine (871)
    • Rheumatology (409)
    • Sexual and Reproductive Health (410)
    • Sports Medicine (342)
    • Surgery (448)
    • Toxicology (53)
    • Transplantation (185)
    • Urology (165)