Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Scaling Self-Supervised Learning for Histopathology with Masked Image Modeling

View ORCID ProfileAlexandre Filiot, View ORCID ProfileRidouane Ghermi, View ORCID ProfileAntoine Olivier, View ORCID ProfilePaul Jacob, View ORCID ProfileLucas Fidon, View ORCID ProfileAlice Mac Kain, View ORCID ProfileCharlie Saillard, View ORCID ProfileJean-Baptiste Schiratti
doi: https://doi.org/10.1101/2023.07.21.23292757
Alexandre Filiot
1Owkin, Inc., New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alexandre Filiot
  • For correspondence: alexandre.filiot{at}owkin.com
Ridouane Ghermi
1Owkin, Inc., New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ridouane Ghermi
Antoine Olivier
1Owkin, Inc., New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Antoine Olivier
Paul Jacob
1Owkin, Inc., New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Paul Jacob
Lucas Fidon
1Owkin, Inc., New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Lucas Fidon
Alice Mac Kain
1Owkin, Inc., New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alice Mac Kain
Charlie Saillard
1Owkin, Inc., New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Charlie Saillard
Jean-Baptiste Schiratti
1Owkin, Inc., New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jean-Baptiste Schiratti
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Computational pathology is revolutionizing the field of pathology by integrating advanced computer vision and machine learning technologies into diagnostic workflows. It offers unprecedented opportunities for improved efficiency in treatment decisions by allowing pathologists to achieve higher precision and objectivity in disease classification, tumor microenvironment description and identification of new biomarkers. However, the potential of computational pathology in personalized medicine comes with significant challenges, particularly in annotating whole slide images (WSI), which is time-consuming, costly and subject to inter-observer variability. To address these challenges, Self-Supervised Learning (SSL) has emerged as a promising solution to learn representations from histology patches and leverage large volumes of unlabelled WSI. Recently, Masked Image Modeling (MIM) as a SSL framework has emerged and is now considered to outperform purely contrastive learning paradigms. In this work, we therefore explore the application of MIM to histology using iBOT, a self-supervised transformer-based framework. Through a wide range of 17 downstream tasks over seven cancer indications, both at the slide and patch levels, we provide recommendations on the pre-training of large models for histology data using MIM. First, we demonstrate that in-domain pre-training with iBOT outperforms both ImageNet pre-training and a model pre-trained with a purely contrastive learning objective, MoCo v2. Second, we show that Vision Transformers (ViT) models, when scaled appropriately, have the capability to learn pan-cancer representations that benefit a large variety of downstream tasks. Finally, our iBOT ViT-Base model (80 million parameters), pre-trained on more than 40 million histology images from 16 different cancer types, achieves state-of-the-art performance in most weakly-supervised WSI classification tasks compared to other SSL frameworks available in the literature. This paves the way for the development of a foundation model for histopathology. Our code, models and features are publicly available at https://github.com/owkin/HistoSSLscaling.

Competing Interest Statement

All authors are employees of Owkin, Inc., New York, NY, USA.

Funding Statement

This work was granted access to the HPC resources of IDRIS under the allocations 2022-AD011012519 and 2023-AD011012519R1 made by GENCI.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • Adding the link to our public Github repository.

Data Availability

The results published in this work are partly based upon data generated by the TCGA Research Network (TCGA). All images and the associated clinical outcome for TCGA cohorts used in this study are publicly available at https://portal.gdc.cancer.gov/ and cBioPortal https://www.cbioportal.org/. Regarding the PAIP dataset, de-identified pathology images and annotations used in this research were prepared and provided by the Seoul National University Hospital by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI18C0316).

https://portal.gdc.cancer.gov/

https://www.cbioportal.org/

http://www.wisepaip.org/paip

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted September 14, 2023.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Scaling Self-Supervised Learning for Histopathology with Masked Image Modeling
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Scaling Self-Supervised Learning for Histopathology with Masked Image Modeling
Alexandre Filiot, Ridouane Ghermi, Antoine Olivier, Paul Jacob, Lucas Fidon, Alice Mac Kain, Charlie Saillard, Jean-Baptiste Schiratti
medRxiv 2023.07.21.23292757; doi: https://doi.org/10.1101/2023.07.21.23292757
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Scaling Self-Supervised Learning for Histopathology with Masked Image Modeling
Alexandre Filiot, Ridouane Ghermi, Antoine Olivier, Paul Jacob, Lucas Fidon, Alice Mac Kain, Charlie Saillard, Jean-Baptiste Schiratti
medRxiv 2023.07.21.23292757; doi: https://doi.org/10.1101/2023.07.21.23292757

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Pathology
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)