Abstract
Introduction Machine learning and artificial intelligence (AI) models have been applied in histopathology to solve specific problems like detection of metastasis in lymph nodes and immunohistochemical scoring. We have aimed to develop a machine learning model which can be trained in histopathology from the basics, i.e. identification of normal tissue. We have tried to replicate the process through which a human pathologist learns recognition of normal tissue from histological sections, and evaluate the performance of a machine learning model at this task.
Materials and methods A total of 658 histologic images were anonymised, microphotographed at 10x magnification, under the same condition of illumination, with a Magnus DC5 integrated microphotography system. The images were split into two subsets, training (386) and validation (272 images). The images belonged to seven classes of tissue: brain, intestine, kidney, liver, lungs, muscle and skin. Archived material of the hospital were used for the study. A machine learning model using convolutional neural network (CNN) was developed on the Keras platform, using the convolution layers of a pretrained VGG16 model. The model was trained with the training set of images over 10 epochs. After training, performance of the model was assessed on the validation set.
Results The model achieved 88.24% accuracy in classifying the images of the validation set. The most frequent errors were met in recognising images of kidney (14 errors, 33.33%). The commonest error was wrongly classifying kidney tissue as liver (07 errors). Analysis of the deeper layers of the neural network revealed specific patterns in images which were wrongly classified.
Conclusion The results of the present study indicates that a convolutional neural network might be trained in histology similar to a trainee pathologist. The study represents the first step towards developing a machine learning model as a generalised histopathological image classifier.
Introduction
Histopathology is held as the final bastion of pathology and the gold standard of diagnostic testing. However, interpretation of histopathological sections is inherently observer dependent, and years of training are required to obtain expertise in histopathology. The initial training in histopathology is spent in long hours looking at sections of normal tissue, and preparing a generalised mental representation of each kind of tissue. The trainee pathologist must recognise and classify normal tissue, and this comprises a significant proportion of his/ her training in pathology.
The identification of tissues from histological sections has long been held to be the domain of professionally trained human beings. However, present day machine learning models have matured sufficiently enough, so much so that the task of identification of histological sections by machines can be taken up.
Machine learning models have successfully classified real world images from a very large dataset. [1] Such models have been able to classify images into multiple classes – i.e. animals, cars, people, street signs, objects of daily use. The basis for such models is usually a artificial neural network (ANN).[2] ANNs are a departure from conventional image analysis techniques. An ANN is constructed of multiple feed forward layers of simpler units (‘neurones’), each of which take an input number x and convert it to a non linear output
Where ‘w’ and ‘b’ are the two parameters ‘weight’ and ‘bias’ of the particular neurone and f is a non-linear function. A neural network is made of several such layers made of individual neurones. Over repeated epohcs of training, the network adjusts its parameters so as to produce a correct result majority of the time.
For image recognition, a specialised class of ANNs, the convolutional neural network (CNN), is widely in use; the theory of convolutional neural networks has been described in detail by Karpathy et al.[3] CNNs have successfully been applied in several fields of histopathology, such as classification of histopathologic patterns of lung adenocarcinoma[4], for scoring immunohistochemical staining on breast cancers[5], for Gleason’s scoring on prostate cancer[6] [7], mitotic count in breast cancer[8], tumor proliferation, budding and lymphatic vessel density[9] [10], detecting metastasis in lymph nodes[11] and gland segmentation[12].
Such studies have achieved varying levels of success in solving a particular problem. We have taken the systematic approach of training a machine learning model from the ground up, similar to the manner the medical student is trained. Our aim was to develop and test a CNN which can classify normal histological images into seven categories: brain, lung, skin, liver, kidney, muscle, intestine. A similar study has been attempted by Kieffer et al on grayscale images, achieving 74.87% accuracy[13] on 24 classes, using the feature vector from the last layer of a pretrained neural network.
Materials and methods
We photographed histopathologic slides from 07 classes of tissues: brain, intestine, kidney, liver, lungs, muscle and skin. The histologic sections were retrieved from the archives of the laboratory. All sections were earlier stained with hematoxylin and eosin (staining method is described in Table 1).
Staining protocol
Distribution of images in training and validation set
Performance of the model on the validation set
A total of 658 foci were microphotographed, at 10x magnification, with 0.25 numerical aperture, under the same condition of illumination, with a Magnus DC5 integrated microphotography system. The images were split into two subsets, training (386) and validation (272 images). The distribution of images in two datasets is shown as follows.
After collection of images, a machine learning model was developed with the Python[14] programming language and Keras deep learning library; the model constituted of a pretrained image recognition model, VGG16, with the fully connected layers modified to suit the classification problem.[2] [15] The final model consisted of 26 layers of neurones and 2,626,055 trainable parameters (i.e. weights and biases). It accepted a color image of 256 x 192 pixels as input, and produced a single number between 0 to 6 as output (corresponding to the seven classes of images).
The model was trained in the Google Colab platform[16] with 10 epochs, i.e. each training image was shown to the model 10 times. Images were resized to a dimension of 256 x 192 pixels before training. During training, the model adjusted its parameters to minimise the error rate (loss function) at each epoch.
After completion of training, the performance of the model was assessed over the validation set.
Results
The results are depicted as follows.
The maximum accuracy was seen in recognising the classes ‘brain’ and ‘liver’. Figure 1-4 shows a few images correctly identified by the model.
Muscle tissue correctly identified by the model
Kidney correctly identified by the model
Intestine correctly identified by the model
Liver correctly identified by the model
Discussion
The analysis of histologic images is a non trivial machine learning problem, because of the inherent variability in biological tissues. No two foci from any tissue are exactly the same; even in a homogeneous organ like liver, one with well defined compact lobular architecture, arrangement of hepatocytes around central veins radiating towards portal triads show significant variability in each focus. Tis variability in architecture is present in all bodily tissues. For example, in the intestines, there might be significant variability in villus: crypt ratio along the length of the gastro intestinal tract.[17] Similarly, depending on site of biopsy, histology of skin might show variable thickness of epidermis and stratum corneum. A learning model, either human or machine, must learn to take all the morphologic variability in consideration and learn the essential features of histomorphology which give tissues their identity.
The difficulties in histologic image analysis have been enumerated by Komura et al.[18] One of the major difficulties encountered in Whole Slide Image (WSI) analysis is lack of labeled images. A pathologist must manually label a region of interest (ROI) an a WSI for the machine to train. We have not used whole slide images, but random foci from the slide to train the machine learning model. The entire image, not just a ROI, was used as input to the model. The other difficulty is that of magnification, because depending on magnification, the same tissue might show different histologic patterns. We have photographed all images at 10x magnification to have a consistent histologic pattern for the model to learn.
A similar study by Kieffer et al used grayscale histopathology images of 1000 x 1000 pixels, belonging to 24 classes, achieving 74.87% accuracy[13] on 24 classes, using the feature vector from the last layer of a pretrained neural network. They concluded that the performance of a pretrained network and a network built specifically for the task were comparable. We have used color images and a pretrained CNN (VGG16), but altered its final, fully connected layers so that it produces only one of seven outputs. This model has achieved 88% accuracy, although on a smaller dataset than Kieffer et al.
Analysis of the predictions made by the model shows that the model has wrongly classified 14 images of kidney, predicting them as ‘liver’ (07), ‘lungs’ (03) or ‘muscle’ (04). This may be attributable to the deeply eosinophilic renal tubules in these images as well as overfitting on the ‘liver’ class. Kidney tissue is readily recognisable by human observers due to the prominent glomeruli, even if a single glomerulus is present. The fact that the model has often missed out on kidney tissue indicates a difference between how a human and a machine perceives a histologic image.
Analysis of the deeper layers of the model reveals a pattern. Figures 10-12 shows example images from the validation set with the first 3 slices of the first four layers of the CNN. The intermediate layers show the process of convolution, and how the features of the image are used to arrive at a simplified array of numbers. Figure 10 & 11, having at least 03 full or partial glomeruli, were classified correctly as kidney; whereas, figure 12 – having only one small glomerulus – was recognised as liver. Interestingly, the artifactual tear in the tissue (Figure 11) during sectioning has been lost over successive intermediate layers, indicating that the CNN is not affected by minor artifacts introduced during sectioning.
Muscle tissue wrongly classified by the model as kidney
Intestine wrongly classified by the model; this might be due to the fact that the pattern of an epithelial layer overlying a fibrovascular stroma is common to both skin and intestine
Kidney wrongly classified as liver; the single small glomerulus seems to have been ignored by the model
Kidney wrongly classified as muscle; the deeply eosinophilic staining produces a wrong impression of muscle tissue
The model fails to classify this image (from kidney) in any definite category; there is deep eosinophilia resembling muscle, but also artifactual blank spaces – similar to alveoli in lungs
Inner layers of the network while correctly classifying an image from kidney; the glomeruli are well preserved till the third layer
Another image from kidney, correctly classified, with activations in inner layers; note the presence of glomerular structures in the third layer
Kidney wrongly classified as liver, with activations in inner layers; the single small glomerulus is not preserved till the third layer
Figure 14 shows an image (muscle) which was wrongly classified by the CNN as ‘intestine’. The deeper layers of the CNN, while operating in this image, produce a characteristic pattern of smooth muscle layering around intestinal epithelia, which might be the cause of this error. Again, in figure 15, the artifactual blank spaces in the image has lead to the wrong classification as ‘lung’
Kidney wrongly classified as lung; possibly due to the artifactual blank space (may have been mistaken for an alveolus)
Image from muscle tissue wrongly classified as intestine; the layered pattern of muscle fibers in this image is reminiscent of intestinal smooth muscles
Muscle tissue wrongly classified as lung; note the abundant artifactual blank spaces in the original image
Conclusion
The results of the present study indicates that a convolutional neural network might be trained in histology similar to a trainee pathologist, and is prone to similar kind of error as that of the beginner human pathologist. However, the study represents the first step towards developing a machine learning model as a generalised histopathological image classifier.
Conflicts of interest
None to declare
Note
Sayak Paul contributed to this work while employed at PyImageSearch