Deep learning approach to description and classification of fungi microscopic images
Bartosz Zieli\'nski, Agnieszka Sroka-Oleksiak, Dawid Rymarczyk, Adam, Piekarczyk, Monika Brzychczy-W{\l}och

TL;DR
This paper presents a deep learning method that classifies fungi microscopic images, reducing diagnosis time and costs by eliminating the need for additional biochemical tests, thus enabling faster and cheaper fungal identification.
Contribution
The paper introduces a novel deep learning and bag-of-words approach for fungi image classification, streamlining diagnosis and reducing reliance on biochemical tests.
Findings
Reduces diagnosis time by 2-3 days
Decreases diagnostic costs
Achieves accurate fungi species classification
Abstract
Diagnosis of fungal infections can rely on microscopic examination, however, in many cases, it does not allow unambiguous identification of the species due to their visual similarity. Therefore, it is usually necessary to use additional biochemical tests. That involves additional costs and extends the identification process up to 10 days. Such a delay in the implementation of targeted treatment is grave in consequences as the mortality rate for immunosuppressed patients is high. In this paper, we apply machine learning approach based on deep learning and bag-of-words to classify microscopic images of various fungi species. Our approach makes the last stage of biochemical identification redundant, shortening the identification process by 2-3 days and reducing the cost of the diagnostic examination.
| Cluster No. | Brightness | Size | Shape | Arrangement | Appearance | color | Quantity |
| 0 | bright | small | oval longitudinal | regular | grouped fragmentary | black pink | high |
|---|---|---|---|---|---|---|---|
| 1 | dark | medium | oval circular | irregular | grouped | black | low |
| 2 | dark | large | longitudinal variform | irregular | grouped fragmentary | black | medium |
| 3 | dark | medium | variform oval | irregular | grouped fragmentary | black | medium |
| 4 | dark | large | longitudinal | irregular | grouped fragmentary | black blue | medium |
| 5 | bright | small | longitudinal oval | irregular | grouped | blue purple | medium |
| 6 | dark | medium | longitudinal oval | irregular | grouped fragmentary | black | medium |
| 7 | bright | small | longitudinal oval | irregular regular | grouped fragmentary | purple | medium |
| 8 | dark | medium | longitudinal oval | irregular | grouped fragmentary | black | high |
| 9 | dark | medium | oval | irregular | grouped | black | low |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Deep learning approach to describing and classifying fungi microscopic images
Bartosz Zieliński1,2*, Agnieszka Sroka-Oleksiak3,4, Dawid Rymarczyk1,2, Adam Piekarczyk1, Monika Brzychczy-Włoch4
1 Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland
2 Ardigen, 76 Podole Street, 30-394 Kraków, Poland
3 Department of Mycology, Chair of Microbiology, Faculty of Medicine, Jagiellonian University Medical College, 18 Czysta Street, 31-121 Kraków, Poland
4 Department of Molecular Medical Microbiology, Chair of Microbiology, Faculty of Medicine, Jagiellonian University Medical College, 18 Czysta Street, 31-121 Kraków, Poland
Abstract
Preliminary diagnosis of fungal infections can rely on microscopic examination. However, in many cases, it does not allow unambiguous identification of the species due to their visual similarity. Therefore, it is usually necessary to use additional biochemical tests. That involves additional costs and extends the identification process up to 10 days. Such a delay in the implementation of targeted therapy may be grave in consequence as the mortality rate for immunosuppressed patients is high. In this paper, we apply a machine learning approach based on deep neural networks and bag-of-words to classify microscopic images of various fungi species. Our approach makes the last stage of biochemical identification redundant, shortening the identification process by 2-3 days, and reducing the cost of the diagnosis.
Introduction
Yeast and yeast-like fungi are a component of natural human microbiota [1]. However, as opportunistic pathogens, they can cause surface and systemic infections [2]. The leading causes of the fungal infections are impaired function of the immune system and imbalanced microbiota composition in the human body. Other factors of fungal infections include steroid treatment, invasive medical procedures, and long-term antibiotic treatment with a broad spectrum of antimicrobial agents [3, 4, 5].
The standard procedure in mycological diagnostics begins with collecting various types of test materials like swabs, scraps of skin lesions, urine, blood, or cerebrospinal fluid. Next, the clinical materials (marked as B in Fig. 1) are directly cultured on special media, while the blood and cerebrospinal fluid samples (marked as A in Fig. 1) require prior cultivation in automated closed systems for additional 2-3 days. Material incubates under specific temperature conditions (usually for 2-4 days in case of yeast-like fungi). The initial identification of fungi bases on the assessment of the cells’ shapes observed under the microscope as well as the growth rate, type, shape, color, and the smell of the colonies. Such analysis allows the assignment to fungi type; however, identification of the species is usually impossible due to the significant similarity between them. Because of that, further analysis consisting of biochemical tests, is necessary. As a result, the entire diagnostic process from the moment of culture to species identification can last 4-10 days (see Fig. 1).
In this paper, we apply a machine learning approach based on deep neural networks and bag-of-words approaches to classify microscopic images of various fungus species. As a result, the last stage of biochemical identification is unnecessary, which shortens the identification process by 2-3 days and reduces the cost of diagnosis. It allows accelerating the decision about the introduction of an appropriate antifungal drug, which prevents the progression of the disease and shortens the time of patient recovery.
According to our best knowledge, there are no other methods for classifying fungi species based only on microscopic images. Existing methods involve techniques such as morphological identification of a type of fungi [6], fluorescence in situ hybridization (FISH) [7], biochemical techniques, molecular approaches, such as PCR [8], and sequencing [9]. However, all of them are costly. On the other hand, our method bases on basic microbiological staining (Gram staining) and a simple microscope equipped with a camera, and takes only a few minutes, which makes it easily applicable in many laboratories.
The paper is structured as follows. First, we introduce a fungus database and describe a classification method based on deep neural networks and bag-of-words methods. Then, we present experimental setup, results, and conclusion.
Materials and methods
Materials. One of the most common fungal infections is candidiasis [5], mainly caused by Candida albicans (50-70% of cases) [10]. Other species responsible for the diseases are Candida glabrata [2, 3], Candida tropicalis [4], Candida krusei [11], and Candida parapsilosis [3, 4]. In high-risk patients, severe infections can also be caused by Cryptococcus neoformans [12] and Saccharomyces phylum [13]. Taking those facts into consideration, we prepared database, which consists of five yeast-like fungal strains: Candida albicans ATCC 10231 (CA), Candida glabrata ATCC 15545 (CG), Candida tropicalis ATCC 1369 (CT), Candida parapsilosis ATCC 34136 (CP), and Candida lustianiae ATCC 42720 (CL); two yeast strains: Saccharomyces cerevisae ATCC 4098 (SC) and Saccharomyces boulardii ATCC 74012 (SB); and two strains belonging to the Basidiomycetes: Maalasezia furfur ATCC 14521 (MF) and Cryptococcus neoformans ATCC 204092 (CN). All strains are from the American Type Culture Collection. The species in our database highly overlap with the most common fungal infections; however, they are not identical due to the limitations of our repository.
The strains were cultured on Sabouraud agar at C for 48h (together with olive oil in the case of Maalaseizia furfur). After this time, microscopic preparations were made (2 preparations for each fungal strain) and stained with Gram method. Images were taken using an Olympus BX43 microscope with 100 times a super-apochromatic objective under oil-immersion. The photographic documentation was then produced with an Olympus BP74 camera and CellSense software (Olympus).
Altogether, our Digital Images of Fungus Species database (DIFaS) contains 180 images (9 strains 2 preparations 10 images) of resolution with 16-bits intensity range in every pixel. In Fig. 2, we present three random thumbnails for each of the registered strains.
Method. Deep Neural Networks (DNN) have shown human-level performance in case of large amounts of training data; however, they are limited when it comes to the application on small datasets due to the large numbers of parameters. Therefore, in this work, we consider two types of domain adaptation, both based on DNN features initially pre-trained on a different task (i.e., instance classification [14]). As a baseline method, we fine-tune the classifier’s block of the well-known network architectures, i.e., AlexNet [15], DenseNet169 [16], InceptionV3 [17], and ResNet [18] (with frozen features’ block). As we present in results, such architectures are not optimal; hence, we propose to apply the deep bag-of-words multi-step algorithm shown in Fig. 3. In contrast to baseline methods, which utilize “shallow” Neural Network to previously calculated features, our strategies aggregate those features using one of the bag-of-words approaches and then classify them with Support Vector Machine (SVM). Such a policy, previously applied to texture recognition [19] and bacteria colony classification [20], is more accurate than the baseline methods; however, it is not well known. Therefore, to make this paper self-contained, below, we describe its successive steps.
To generate robust image representation, AlexNet [15], InceptionV3 [17] or ResNet [18] pre-trained on ImageNet [14] database are used. Another option would be to use conventional handcrafted descriptors (like ORB [21] or DSIFT [22]); however, they are usually outperformed by deep features. Considered network architectures consist of two parts: convolutional layers, which are responsible for extracting image features (so-called features’ block), and fully connected layers, which are responsible for the classification (so-called classifier’s block). Classifier’s block cannot be directly used because it was trained for other types of images; however, features’ block encodes more general, reusable information. Therefore, removing the classifier block from the network and preserving convolutional layers allows us to generate robust image features. In the case of AlexNet, we obtain a set of points in -dimensional space, whose number depends on the input image’s resolution (e.g., in case of resolution pixels, points () are generated).
Since the classified patches are always of the same size, their features’ blocks could be used directly by the classifier. It, however, would lead to vast data dimensionality (i.e., the feature vector of size ), which according to our experiments, results in the lack of generalization, primarily due to the relatively small size of the training set (100 images). Therefore, to obtain a more reliable representation of patches, we pool the acquired set of points using Bag of Words, BoW [23, 24], or its more expressive modification called Fisher Vector, FV [25]. The idea behind both of them is to aggregate a set of points (representing the patch) with a so-called codebook. The codebook is usually generated from the subset of training data in an unsupervised manner using a clustering algorithm (e.g., k-Means or Expectation Maximization [26]). Given a codebook, the set of -dimensional points obtained with AlexNet for a particular image is encoded by assigning points to the nearest codeword. In traditional Bag of Words, this encoding leads to a codeword histogram, i.e., a histogram for which each codeword contains points closest to this codeword. In the case of the Fisher Vector, the clusters are replaced with a Gaussian Mixture Model (GMM), and the representation encodes the log-likelihood gradients with respect to the parameters of this model. In this paper, we will use notations deep Bag of Words and deep Fisher Vector to refer to those two types of pooling methods.
As a result of pooling, one fixed-size vector is obtained for each of the analyzed patches, which can be classified with any machine learning methods to distinguish between various fungus species. We decided to use Support Vector Machine and Random Forest classifiers for this step.
Experimental setup and results
For the experiments, we split our DIFaS database (9 strains 2 preparations 10 images) into two subsets, so that both of them contain images of all strains, but from different preparation. It is because each preparation has its characteristics, and according to our previous studies [20], using images from the same preparation both in training and test set can result in overstated accuracy. As an example, let us consider the background-size, which depends on the size of the colony moved by inoculation loop from Sabouraud agar to preparation. Because there are only two preparations for each species in the dataset, the classifier could end up learning clinically irrelevant background-size instead of relevant fungus features. Therefore, images from particular preparation should not be shared between training and test set. We decided to use 2-fold cross-validation (one fold with images from the first preparations and the second fold with images from the second preparations). Moreover, we decided to classify patches instead of the whole image (see Image preprocessing for details) and introduce additional class corresponding to the background (BG) to compensate for the preparation characteristic on the final result. For each fold, we optimize the following parameters using internal 5-fold cross-validation: number of clusters in BoW ; number of clusters in FV ; SVM kernel ; SVM ; SVM . As the evaluation metric for grid search optimization, we use the accuracy classification score. Best results were obtained for FV with clusters and SVM with kernel, , and .
We performed all the experiments on a workstation with one 12 GB GPU and 256 GB RAM. On average, feature extraction, pooling, and classification take from to hours when training deep Fisher Vector. Such performance was possible thanks to the adaptation of the VLFeat library [27]. For comparison, the fine-tuning of the well-known architectures takes from to hours (see Table LABEL:tab:test_patch_based). Processing time in case of baseline methods was measured by multiplying the average time of an epoch by the number of epochs till the early stopping (i.e., the increase in validation loss). In the case of deep bag-of-words approaches, processing time was computed as a sum of all three steps of the algorithm (i.e., obtaining image representation, pooling, and classification).
The remaining part of this section is structured as follows. First, we describe image preprocessing, including contrast stretching and background removal. Then, we describe the results obtained for patch-based classification using deep bag-of-words approaches and compare them with the well-known network architectures. To explain the outcomes of deep BoW, we introduce an in-depth explanatory analysis of the obtained codebooks together with the microbiological feedback. We continue this investigation for a deep FV approach. Finally, we present results obtained for scan-based classification, computed by aggregating patch-based scores. The code implemented in Python with PyTorch library is available at \urlhttps://github.com/bziiuj/fungus.
Image preprocessing
DIFaS database contains images of relatively high resolution and intensity range (from 0 to 65535); however, the actual pixel values are usually between 0 and 1000 (see Fig. 4a). Therefore, in the first step of preprocessing, we compute the lower and upper-intensity limits (separately for every image) and use them for contrast stretching (see Fig. 4b). Moreover, images are scaled to the range .
To overcome the issues with preparation characteristic (e.g., background-size), as the second step of preprocessing, we extract and classify only image patches with the reasonable foreground to background proportions (FBP), so patches with a rational number of foreground pixels. To obtain foreground-background segmentation on the pixel level, we apply thresholding (with threshold equal ) to a grayscaled and blurred version of the scanned image. Such a simple segmentation is sufficient and works for all the images from the dataset (see Fig. LABEL:fig:thumbnails_train and LABEL:fig:thumbnails_test) because the background is always much brighter than the areas with fungi cells. We tested three possible options of FBP: , , and (see Fig. 4d-f). Based on empirical studies, we decided to use FBP equal , which gains around comparing to the other options. As a result, we obtain rough segmentations with approximated locations of foreground patches (those with FBP greater than ) and background patches (those with FBP smaller than ). Additionally, we experimented with two image scales: the original images and images scaled by factor (with bicubic interpolation), concluding that the latter gains around 4% comparing to the former.
Patch-based classification
In this experiment, we use baseline models (well-known network architectures) as well as deep Bag of Words and deep Fisher Vector models to classify each patch of the image separately. As baseline models, we fine-tune the classifier’s block of the well-known network architectures, such as AlexNet [15], DenseNet169 [16], InceptionV3 [17], and ResNet [18] for 100 epochs (with frozen features’ block). Every baseline model was previously pre-trained on the ImageNet database [14]. Before running all the experiments, we experimentally chose the optimal FBP (), patch size ( pixels), and image scale () using grid search optimization. We apply data augmentation (rotations, mirror reflection, and random noise) for better regularization.
The overall comparison of tested methods is presented in Table LABEL:tab:test_patch_based. One can observe that deep Fisher Vector works better than all the other techniques, including deep Bag of Words. However, its accuracy drops dramatically in the case of Candida glabrata (CG) and Cryptococcus neoformans (CN). In the case of CN it is most probably caused by a reduced number of samples, while in the case of CG due to its more substantial variance in the arrangement, appearance, and quantity (especially between two preparations, see Fig. LABEL:fig:thumbnails_train and Fig. LABEL:fig:thumbnails_test). Moreover, CG images are hard to classify due to partial discoloration (pink color instead of purple) and vast overlapping of cells. As a result, CG is often classified as Candida lustianiae (CL) belonging to the same genus (see confusion matrix in Fig. 5b). However, the classification error should decrease if the biological material of microscopic preparation has the smallest possible density with separated cells, as overlapping is the leading cause of blurriness.
To further understand the reason for incorrect classification, we prepare a qualitative confusion matrix for deep Fisher Vector to show examples of correctly and incorrectly classified patches (see Fig. LABEL:fig:fungus_cm). We observe a high morphological similarity between misclassified species belonging to genus Candida, Cryptococcus, and Saccharomyces, especially if the preparation with the biological material is discolored. Moreover, one can notice that deep Fisher Vector can return two different results for two highly overlapped patches from the same scan. It is usually caused by the artifacts in the background, such as purple trail in Candida lustianiae (CL), predicted as Cryptococcus neoformans (CN), or Maalasezia furfur (MF) in Fig. LABEL:fig:fungus_cm. The other incorrect classifications appear due to the small number of incomplete (fragmented) cells (see Candida glabrata (CG) predicted as CN in Fig. LABEL:fig:fungus_cm).
Analysis of deep Bag of Words clusters
In this section, we first analyze deep Bag of Words pooling step by visualizing clusters using the patches nearest to their centroids. Then, based on those patches, we introduce a description of the considered species using properties pre-defined by the microbiologists. Finally, we present the mean deep BoW for every species. To make our analysis clearer, in this section, we limit deep BoW to clusters, although its optimal number obtained with grid search optimization is . Moreover, the presented properties are introduced only to explain the intrinsic rules of the method. They are not used in the automatic classification, which requires only a scan image as an input.
Ten nearest neighbors of ten deep BoW centroids obtained with k-Means algorithm are presented in Fig. 6. One can observe that they share common features and, therefore, can be used to determine which visual properties are essential for the classifier. We consider the following properties (see Table 1): brightness (dark or bright), size (small, medium or large), shape (circular, oval, longitudinal or variform), arrangement (regular or irregular), appearance (singular, grouped or fragmentary), color (pink, purple, blue or black), and quantity (low, medium or high). As a result, the standard set of parameters used to describe the species (size, shape, arrangement, and appearance) was significantly extended.
To investigate which visual properties are essential for the classifier, we calculate mean deep Bag of Words representation for every species (see Fig. 7) and then examine how the visual information about their main clusters corresponds to the knowledge of a microbiologist. The main conclusions she drew are as follows:
- •
species of the genus Candida mainly belong to cluster 2 with black cells of medium or large size, and oval or longitudinal shape;
- •
Maalasezia furfur has been assigned to clusters 0, 2, 5 and 8, mostly representing the black and longitudinal shape of various size;
- •
Saccharomyces boulardii and Saccharomyces cerevisiae are mainly described by clusters 1, 2, 4 and 8, which are characterized by black color, medium or large size and longitudinal shape;
- •
Candida tropicalis and Saccharomyces cerevisiae have very similar mean Bag of Words, which confirms high morphological similarity described in [28], i.e., size -- , oval shape, elongated, and occurring singly or in small groups).
Analysis of deep Fisher Vector and SVM classifier
In this section, we first analyze the power of deep Fisher Vector representation using the t-SNE algorithm [29] by projecting it on a 2D surface. Then, we analyze classifier certainty based on the scores obtained for various patches.
Projection of high-dimensional deep Fisher Vector to 2D using the t-SNE algorithm is presented in Fig. 8. One can observe that classes are generally well separated in the case of AlexNet and ResNet18 architectures. Nevertheless, species of the same genus are not more coherent than the other species, in contrast to what we expected. Moreover, one can observe that InceptionV3 fails domain adaptation in the case of microbiological images, which explains the results in Table LABEL:tab:test_patch_based.
The second task in this section was to analyze classifier certainty. For this purpose, we investigate the distance of patches’ representations from the classifier hyperplane, which roughly corresponds to how sure the classifier is of its decisions. Most left and right patches in Fig. LABEL:fig:svm_analysis are correctly classified with high probability, while the ones in the middle are ambiguous. The most representative fungal Malassezia furfur (MF) cells have oval, longitudinal shape, and often occur in the budding form, in which the daughter cells are as wide as the parent cells. While in the case of Saccharomyces cerevisae (SC), fungal cells characterize with round shapes, more significant in relation to Candida albicans (CA), which are arranged individually or in small groups.
Scan-based classification
To analyze classification score for the whole scan (instead of just patches, like in previous sections), we predict classification for all foreground patches of one scan and aggregate them to obtain the most frequently predicted species. As presented in Table LABEL:tab:test_scan_based, deep Fisher Vector performs better than the other methods, also in this case, obtaining better accuracy than the best baseline method (ResNet18).
Conclusions and future work
In this paper, we apply deep neural networks and bag-of-words approaches to classify microscopic images of various fungi species. According to our experiments, the combination of features from deep neural networks with Fisher Vector works better than fine-tuning the classifier’s block of the well-known network architectures and has the potential to be successfully used by microbiologists in their daily practice.
A large part of this paper is dedicated to the explainability of deep bag-of-words approaches to increase the trust in deep neural networks. For this purpose, we introduce an in-depth visual description of the properties pre-defined by the microbiologists. We hope that it will help to understand similarities and differences between fungi species better.
In our experiment, we assumed that images are obtained from the same laboratory and with the same scanner (details are presented in the Materials). However, in our opinion, this method could be easily extended to more diversified datasets by using additional preprocessing steps, which unify the input data. Due to the lack of data for such experiments, we did not cover this issue in the current article; however, it is planned for future research. Moreover, we would like to extend the DIFaS database so that it contains more preparations for all species, also gathered from other laboratories and scanners. Finally, we plan to prepare scans containing more than one species, as the automatic classification of such images would help to exclude the culture phase from the microbiological pipeline.
Acknowledgments
This work was supported by the National Science Centre, Poland, under grants no. 2015/19/D/ST6/01215.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 11. Maiken CA. Candida and candidaemia. Susceptibility and epidemiology. Danish medical journal. 2013;60 11:B 4698.
- 22. Rodrigues C, Silva S, Henriques M. Candida glabrata: a review of its features and resistance. European Journal of Clinical Microbiology & Infectious Diseases. 2014;33 5:673–688.
- 33. Trofa D, Gácser A, Nosanchuk JD. Candida parapsilosis, an emerging fungal pathogen. Clinical Microbiology Reviews. 2008;21 4:606–625.
- 44. Silva S, Negri M, Henriques M, Oliveira R, Williams DW, Azeredo J. Candida glabrata, Candida parapsilosis and Candida tropicalis: biology, epidemiology, pathogenicity and antifungal resistance. FEMS Microbiology Reviews. 2012;36(2):288–305.
- 55. Silveira FP, Husain S. Fungal infections in solid organ transplantation. Medical Mycology. 2007;45(4):305–320.
- 66. Papagianni M. Characterization of fungal morphology using digital image analysis techniques. J Microb Biochem Technol. 2014;6(4):189.
- 77. Lakner A, Essig A, Frickmann H, Poppert S. Evaluation of fluorescence in situ hybridisation (FISH) for the identification of Candida albicans in comparison with three phenotypic methods. Mycoses. 2012;55(3):e 114–e 123.
- 88. Ferrer C, Colom F, Frasés S, Mulet E, Abad JL, Alió JL. Detection and identification of fungal pathogens by PCR and by ITS 2 and 5.8 S ribosomal DNA typing in ocular infections. Journal of clinical microbiology. 2001;39(8):2873–2879.
