Abstract
X-ray imaging in Digital Imaging and Communications in Medicine (DICOM) format is the most commonly used imaging modality in clinical practice, resulting in vast, non-normalized databases. This leads to an obstacle in deploying artificial intelligence (AI) solutions for analyzing medical images, which often requires identifying the right body part before feeding the image into a specified AI model. This challenge raises the need for an automated and efficient approach to classifying body parts from X-ray scans. Unfortunately, to the best of our knowledge, there is no open tool or framework for this task to date. To fill this lack, we introduce a DICOM Imaging Router that deploys deep convolutional neural networks (CNNs) for categorizing unknown DICOM X-ray images into five anatomical groups: abdominal, adult chest, pediatric chest, spine, and others. To this end, a large-scale X-ray dataset consisting of 16,093 images has been collected and manually classified. We then trained a set of state-of-the-art deep CNNs using a training set of 11,263 images. These networks were then evaluated on an independent test set of 2,419 images and showed superior performance in classifying the body parts. Specifically, our best performing model (i.e., MobileNet-V1) achieved a recall of 0.982 (95% CI, 0.977– 0.988), a precision of 0.985 (95% CI, 0.975–0.989) and a F1-score of 0.981 (95% CI, 0.976–0.987), whilst requiring less computation for inference (0.0295 second per image). Our external validity on 1,000 X-ray images shows the robustness of the proposed approach across hospitals. These remarkable performances indicate that deep CNNs can accurately and effectively differentiate human body parts from X-ray scans, thereby providing potential benefits for a wide range of applications in clinical settings. The dataset, codes, and trained deep learning models from this study will be made publicly available on our project website at https://vindr.ai/datasets/bodypartxr.
1. Introduction
X-ray is the most commonly performed procedure in clinical practice. More than 600 million X-ray examinations are conducted yearly [3] for evaluating various human body parts such as the lungs, heart size, bowel, and bones. In recent decades, many automatic medical image analysis systems, particularly deep learning-based systems, have been studies and deployed to support radiologists in interpreting X-ray scans. To date, hundred AI software products for clinical radiology [15] have been introduced. These systems are often developed for analyzing specific anatomies (e.g., lung, abdominal, spine, etc.) and often require the identification of the human body contained in the input image. Vast, non-normalized databases of X-ray images from hospitals raise the need for an automated approach to classify body parts from X-ray scans. An automatic system for accurate classification of body parts from X-ray scans helps identify the right input for AI systems. It is also a useful tool for data management at hospitals or medical centers. Several body part recognition systems, which were relied on carefully hand-crafted features, have been introduced [1, 7]. In particular, machine learning-based algorithms [1, 12] have been applied and shown their superior performance on this task. We observed two limitations of the existing approaches. First, these methods were developed and tested on ImageCLEF’s 2015 – a quite small dataset with 500 training images and 250 test images. This fact raises concerns [10] about the robustness of the predictive models in real clinical contexts. Second, an automatic body part recognition system plays as an image router that requires a near-perfect level of performance (100%) in recognizing the images. Meanwhile, the existing approaches reported a performance of about 80%–85% in accuracy, which is not confident enough to deploy in real-world clinical settings. Hence, this work aims to develop a highly accurate deep learning-based system for grouping unknown X-ray images into five anatomical groups: abdominal X-ray, adult chest X-ray, pediatric chest X-ray, spine X-ray, and others. To this end, a large-scale X-ray dataset consisting of 16,093 images has been collected and manually classified. We then trained a set of state-of-the-art deep CNNs using a training set of 11,263 images. These networks were then evaluated on an independent test set of 2,419 images and showed superior performance in classifying the body parts while requiring less computation for inference. To summarize, the main contributions of this work two folds:
We introduce and release a large-scale dataset for the classification of body parts from X-ray scans. The dataset contains 16,093 X-ray images in DICOM format, for which each was manually annotated for five anatomical groups: abdominal X-ray, adult chest X-ray, pediatric chest X-ray, spine X-ray, and others. To the best of our knowledge, this is the largest X-ray dataset for human body part classification task to date. It will be opened for public access from https://vindr.ai/datasets/bodypartxr.
We develop a robust DICOM Imaging Router that used a state-of-the-art deep CNN model to classify X-ray images based on the presence of the body part in the image. Our experimental results show superior performance on an independent test set while requiring less computation for inference. The proposed system potential benefits for a wide range of applications in clinical settings. It was made publicly available at https://github.com/vinbigdata-medical/DICOM-Imaging-Router for the community as an open deep learning framework that can be easily reused and finetuned.
2. Methodology
2.1. DICOM Imaging Router: System overview
An overview of the DICOM Imaging Router is illustrated in Figure 1. It is a deep learning-based classifier that accepts an unknown X-ray as input and classifies it into one of five groups, including abdominal X-ray, adult chest X-ray, pediatric chest X-ray, spine X-ray, and others. From a practical point of view, a reliable DICOM Image Router should ensure two essential requirements, including (1) a nearly 100% classification accuracy, and (2) a low inference time. To achieve these goals, we collect and annotate a large-scale X-ray dataset. We then train a set of state-of-the-art lightweight CNN models. Mathematically, this is a supervised multi-class classification task task that assigns a class label for each input example. Given a training dataset of N labeled examples of the form {(x(i), y(i))}, where x(i) ∈ ℝn is the i-th X-ray example and y(i) ∈ 1, …, K is the i-th class label. Here, K denotes the number of classes.
We develop a deep learning-based classifier for automatic recognition of body parts from X-ray scans. Given an unknown X-ray as input, the system is able to classify the scan into one of five groups, including adult chest X-ray, pediatric chest X-ray, spine X-ray, abdominal X-ray, and others. In a simple practical scenario, each classified image can be then passed through the corresponding AI model.
In this task, we aim at building a learning model fθ such that it classifies accurate for new unseen examples [2]. This task can be done by training a deep CNN that learns a non-linear mapping from the input x(i) ∈ ℝn to the corresponding label y(i) = fθ(x(i)) ∈ ℝK. One common solution to train the network is to minimize the softmax cross-entropy loss
over all N training examples. Here the standard softmax function σ : ℝK → [0, 1]K is defined by the formula
for i = 1, …, K and z = (z1, …zK) ∈ ℝK.
2.2. Data collection and annotation
The dataset used in the study was collected from the Picture Archiving and Communication System (PACS) of several major hospitals. The ethical clearance of this study approved by the IRB of each hospital before any research activities. All patient-identifiable information in the data has been removed. The need for obtaining informed patient consent was waived because this study did not impact clinical care or workflow at the hospital. We recruited a group of human readers to participate in our labeling labeling process. Specifically, all X-ray scans were manually reviewed and classified case-by-case into five groups: abdominal X-ray, adult chest X-ray, pediatric chest X-ray, spine X-ray, and others. In particular, each example was manually classified into two rounds by two different readers. In total, 16,093 images have been collected and manually categorized. We used a stratified random sampling method for dividing the dataset into train, validation, and test set with respective ratios of 0.7/0.15/0.15. As a result, 11,263 images will be used to train deep learning algorithms, 2,411 and 2,419 images will be used as validation and test sets, respectively, for evaluating the algorithms. Each image was then stored in the .PNG format and rescaled to the size of 512×512 pixels. Table 1 below summarizes the data sets used in this study.
Details of training, validation, and test data sets used in this study. To the best of our knowledge, this is the largest X-ray dataset for human body part classification tasks to date.
2.3. Deep learning algorithms
To classify body parts from X-ray images, we exploited state-of-the-art, light-weight CNNs that have achieved remarkable performance on many image classification tasks, including MobileNet-V1 [6], MobileNet-V2 [13], ResNet-18 [5], ResNet-34 [5], and EfficientNet-B0/B1/B2 [14]. We followed the original implementations [6, 13, 5, 14] with minor modifications. Specifically, we replaced the last fully connected layer of each architecture with a new layer of 5 neurons, corresponding to the number of body parts. During the training stage, we rescaled all training images to 512 × 512. All models were trained using cross-entropy loss function with Adam optimizer [8]. The learning rate was set at 1× e−4 and then simulated warm restarts by scheduling the learning rate [9]. All networks were trained for 100 epochs using Pytorch (v1.7.0) on a machine with one RTX 2080 Ti GPU.
3. Experiments and Results
3.1. Experimental setup and evaluation metrics
We evaluated the performance of the proposed models on an internal test set (N = 2,419) and an external (N = 1,000) test set using precision, recall, F1-score and mean inference time (in second on GPU) per image. Using the final prediction provided by the models and the ground truth labels, we calculated the true positives (TPs), true negatives (TNs), false positives (FPs), and false negatives (FNs) as Table 3.
The precision, recall and F1-score were then computed by
For each measure, we estimated 95% bootstrap confidence interval with 10,000 iterations.
3.2. Model performance on internal test set
Table 2 summarizes quantitative results for all the classification models. Deep CNNs showed excellent performances on 2,419 of the external test set. Specifically, our best performing model (i.e. MobileNet-V1 [6], 3.2M) achieved a recall of 0.982 (95% CI, 0.977–0.988), a precision of 0.981 (5% CI, 0.975–0.987) and a F1-score of 0.981 (95% CI, 0.976–0.987), whilst requiring less computation for inference (0.0295 second per image).
Classification performance of different network architectures on the test set. Inference time (in second) is measured on an RTX 2080 Ti GPU machine. Best results are in red.
Confusion matrix
3.3. Model performance on external test set
The domain shift across different hospital settings is the main obstacle in transferring deep learning models into clinical practice [11]. It can result in poor generalization and decreased accuracy [4]. To investigate the generalization ability of the proposed approach across multiple data sources, we performed an external validation test on 1,000 X-ray images collected from another patient cohort. The best-performing model MobileNet-V1 [6] was used for this experiment. It reported a recall of 0.9712, a precision of 0.9738, and an F1-score of 0.9725. This high diagnostic accuracy shows the robustness of the system across different patient cohorts, scanner vendors, and imaging protocols without additional training cost.
4. Conclusions
This work developed and validated a deep learning-based DICOM Imaging Router to classify body parts from X-ray images. A benchmark dataset with 16,093 X-ray images of body parts has been introduced. Experiments demonstrated the effectiveness of the proposed method. The DICOM Imaging Router can be applied for many real-world applications in radiology. For example, it can be integrated into a PACS system to help radiologists find and classify X-ray images quickly and accurately for interpretation. The system can play the role of pre-filter for other AI applications. Our trained models and dataset used in this study will be opened for further development and deployment. For future work, we plan to conduct more experiments and evaluate the impact of the proposed framework in real-world clinical settings.
Data Availability
The dataset used in this study will be made publicly available from our project website at https://vindr.ai/.