Cell image classification: a comparative overview

Mohammad Shifat-E-Rabbi; Xuwang Yin; Cailey Elizabeth Fitzgerald; and; Gustavo K. Rohde

arXiv:1906.03316·q-bio.QM·March 4, 2022

Cell image classification: a comparative overview

Mohammad Shifat-E-Rabbi, Xuwang Yin, Cailey Elizabeth Fitzgerald, and, Gustavo K. Rohde

PDF

1 Repo

TL;DR

This paper reviews and compares three cell image classification methods—feature extraction, neural networks, and transport-based morphometry—across multiple datasets, highlighting their strengths and applications in biology and medicine.

Contribution

It provides a comprehensive comparison of three distinct approaches for cell image classification and evaluates their performance on various datasets.

Findings

01

Neural networks perform best on complex datasets.

02

Transport-based morphometry offers robustness to image variations.

03

Feature extraction methods are computationally efficient.

Abstract

Cell image classification methods are currently being used in numerous applications in cell biology and medicine. Applications include understanding the effects of genes and drugs in screening experiments, understanding the role and subcellular localization of different proteins, as well as diagnosis and prognosis of cancer from images acquired using cytological and histological techniques. We review three different approaches for cell image classification: numerical feature extraction, end to end classification with neural networks, and transport-based morphometry. In addition, we provide comparisons on four different cell imaging datasets to highlight the relative strength of each method.

Equations36

μ_{a cc} = \frac{1}{k} i = 1 \sum k y_{i}^{(t e)} \times 100%

μ_{a cc} = \frac{1}{k} i = 1 \sum k y_{i}^{(t e)} \times 100%

σ_{a cc} = \frac{1}{k - 1} i = 1 \sum k (y_{i}^{(t e)} - μ_{a cc})^{2} \times 100%

κ = 1 - \frac{1 - p _{0}}{1 - p _{e}}

κ = 1 - \frac{1 - p _{0}}{1 - p _{e}}

Ψ (A_{1}, ..., A_{N}, r_{1}, ..., r_{N})

Ψ (A_{1}, ..., A_{N}, r_{1}, ..., r_{N})

= m = 1 \sum N - 1 n = m + 1 \sum N \int_{Ω} ∣ I_{m}^{o} (A_{m} x + r_{m}) - I_{n}^{o} (A_{n} x + r_{n}) ∣^{2} d x

\displaystyle F\left(~{}\vbox{\hbox{\includegraphics[width=21.33955pt]{one_cell.pdf}}}~{}\right)=\begin{bmatrix}F_{1}=\text{area}\\ F_{2}=\text{perimeter}\\ F_{3}=\text{Harlick texture}\\ \vdots\\ F_{N}=\text{Euler number}\end{bmatrix}

\displaystyle F\left(~{}\vbox{\hbox{\includegraphics[width=21.33955pt]{one_cell.pdf}}}~{}\right)=\begin{bmatrix}F_{1}=\text{area}\\ F_{2}=\text{perimeter}\\ F_{3}=\text{Harlick texture}\\ \vdots\\ F_{N}=\text{Euler number}\end{bmatrix}

σ (Θ_{L} σ (\dots σ (Θ_{2} σ (Θ_{1} v + b_{1}) + b_{2}) + \dots) + b_{L}) = OTN-1 OTN-2 OTN-3 ⋮ OTN-P

σ (Θ_{L} σ (\dots σ (Θ_{2} σ (Θ_{1} v + b_{1}) + b_{2}) + \dots) + b_{L}) = OTN-1 OTN-2 OTN-3 ⋮ OTN-P

\int_{R^{2}} I_{m}^{p} (x, y) d x d y = \int_{R^{2}} I_{0}^{p} (x, y) d x d y = 1

\int_{R^{2}} I_{m}^{p} (x, y) d x d y = \int_{R^{2}} I_{0}^{p} (x, y) d x d y = 1

\hat{I}_{m}^{p} = R (I_{m}^{p}); \hat{I}_{0}^{p} = R (I_{0}^{p})

\hat{I}_{m}^{p} = R (I_{m}^{p}); \hat{I}_{0}^{p} = R (I_{0}^{p})

\hat{I} (t, θ)

\hat{I} (t, θ)

= \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} I (x, y) δ (t - x \mbox cos (θ) - y \mbox s in (θ)) d x d y

\int_{- \infty}^{f_{m} (t, θ)} \hat{I}_{m}^{p} (τ, θ) d τ = \int_{- \infty}^{t} \hat{I}_{0}^{p} (τ, θ) d τ, \forall θ \in [0, π]

\int_{- \infty}^{f_{m} (t, θ)} \hat{I}_{m}^{p} (τ, θ) d τ = \int_{- \infty}^{t} \hat{I}_{0}^{p} (τ, θ) d τ, \forall θ \in [0, π]

I_{m}^{t} (., θ) = (f_{m} (., θ) - i d) \hat{I}_{0}^{p} (., θ)

I_{m}^{t} (., θ) = (f_{m} (., θ) - i d) \hat{I}_{0}^{p} (., θ)

J (W) = \frac{W ^{T} S _{b} W}{W ^{T} S _{w} W}

J (W) = \frac{W ^{T} S _{b} W}{W ^{T} S _{w} W}

w \in R^{n}, s \in R^{N^{(t r)}} min \frac{1}{N ^{(t r)}} m = 1 \sum N^{(t r)} s_{m} + \frac{λ}{2} ∣∣ w ∣ ∣_{2}^{2}

w \in R^{n}, s \in R^{N^{(t r)}} min \frac{1}{N ^{(t r)}} m = 1 \sum N^{(t r)} s_{m} + \frac{λ}{2} ∣∣ w ∣ ∣_{2}^{2}

\mbox s u bj ec tt o s_{m} \geq 0, y_{m}^{(t r)} (w^{T} I_{m}^{(t r)}) \geq 1 - s_{m} \forall m \in [N^{(t r)}]

w min -

w min -

+ (1 - y_{m}^{(t r)}) lo g (1 - p (y = y_{m}^{(t r)} ∣ I_{m}^{(t r)}, w)))

p (y = 1∣ x, w) = g (w^{T} x) = \frac{1}{1 + e ^{- w^{T} x}},

p (y = 1∣ x, w) = g (w^{T} x) = \frac{1}{1 + e ^{- w^{T} x}},

p (y = 0∣ x, w) = 1 - p (y = 1∣ x, w),

P (y = j ∣ X = I_{m}^{(t e)}) = \frac{1}{k} i \in A \sum J (y_{i}^{(t r)} = j)

P (y = j ∣ X = I_{m}^{(t e)}) = \frac{1}{k} i \in A \sum J (y_{i}^{(t r)} = j)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rohdelab/cell-image-classification
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Cell image classification: a comparative overview

Mohammad Shifat-E-Rabbi

Imaging and Data Science Lab, Charlottesville, VA-22903, USA

Department of Biomedical Engineering, University of Virginia, Charlottesville, VA-22903, USA

Xuwang Yin

Imaging and Data Science Lab, Charlottesville, VA-22903, USA

Department of Electrical & Computer Engineering, University of Virginia, Charlottesville, VA-22903, USA

Cailey Elizabeth Fitzgerald

Imaging and Data Science Lab, Charlottesville, VA-22903, USA

Department of Biomedical Engineering, University of Virginia, Charlottesville, VA-22903, USA

Gustavo K. Rohde

Imaging and Data Science Lab, Charlottesville, VA-22903, USA

Department of Biomedical Engineering, University of Virginia, Charlottesville, VA-22903, USA

Department of Electrical & Computer Engineering, University of Virginia, Charlottesville, VA-22903, USA

[email protected]

Abstract

Cell image classification methods are currently being used in numerous applications in cell biology and medicine. Applications include understanding the effects of genes and drugs in screening experiments, understanding the role and subcellular localization of different proteins, as well as diagnosis and prognosis of cancer from images acquired using cytological and histological techniques. We review three different approaches for cell image classification: numerical feature extraction, end to end classification with neural networks, and transport-based morphometry. In addition, we provide comparisons on four different cell imaging datasets to highlight the relative strength of each method111© Cytometry Part A (2020) 97 (4), 347-362. Permission from the journal must be obtained for all uses..

1 Introduction

Interpretation of images of cells has always played important roles in science and medicine. From their discovery in 1665, observation of the spatiotemporal characteristics of cells through microscopic technology has enabled us to better understand the structure of living cells, as well as how they perform certain functions [1, 2]. In addition, scientists have long used microscopes to evaluate the efficacy of different compounds for drug development [3, 4, 5, 6, 7, 8, 9]. In medicine, as another example, the observation of cell morphology has long been used to discern malignancy in cancer cells [10, 11, 12, 13].

Cells are known to exhibit complex phenotypes such as differences in shape, gene expression, subcellular protein localization, and other qualities. In addition, cell cultures, tissues, and organs are known to exhibit complex heterogeneity of phenotypes. The combination of intricate phenotype differences together with their heterogeneous responses to different conditions (e.g. diseases) has made decoding biological processes an increasingly complex task. Thus computational approaches for analyzing images of cells have been used increasingly to aid in the task of decoding the complexity of biological processes. A common task useful in many practical situations is determining the category of a given cell or set of cells: a task known as cell classification.

Automated cell classification via computational analysis of images of cells have found numerous applications in science, technology, and medicine. Scientists have long used cell classification methods to determine whether a particular drug has affected a given culture of cells in the desired manner [3, 4, 5, 6, 7, 8, 9]. In pathology, an increasing number of researchers are exploring methods to automatically detect cancer based on classification of images of cells [10, 11, 12, 13]. As another example, geneticists have also resorted to automated cell classification in attempts to study gene silencing mechanisms [14, 15]. These and other applications are reviewed in the next section.

In the context of the cell images, there are three broad categories of image classification algorithms that are used currently and reviewed in this article: feature extraction and machine learning, neural networks, and transport-based morphometry.

•

Feature engineering has been used for decades, by means of both manual and automated feature extraction [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]. Features may be hand-picked by experts to distinguish data between classes based on prior knowledge, or existing feature extraction software can compute thousands of data features for classification [27, 28]. Once numerical features are extracted, statistical regression methods for classification (e.g. linear discriminant analysis, random forests, neural networks) are then employed to build a cell image classification system.

•

Artificial neural networks, especially deep learning such as convolutional neural networks (CNNs), have emerged as the leading end-to-end classification systems [29, 30, 31, 32, 33]. CNNs bypass the feature extraction process and instead utilize a number of convolutional layers connecting the input data to the rest of the neural network. Their performance has surpassed previous benchmarks for some tasks, usually in domains where a large amount of data relative to the problem complexity is available for training.

•

Transport-based morphometry (TBM) is a technique for modeling and feature extraction that utilizes the mathematics of optimal transport [34, 6]. Transport-based methods have been used successfully in biomedical data and image analysis tasks, such as statistical modeling and inverse problems [35]. The data transformation method is constructed by comparing data vectors with respect to both their functional values, such as signal or pixel intensities and their location, such as time or pixel coordinates. Unlike linear modeling techniques which compare values at fixed coordinates, the transport-based methods are rooted in the physics of biological processes which are governed by the continuity equation.

The goal of this paper is to present a non-exhaustive comparative overview of cell image classification methods. We aim to review the three categories explained above in more detail, as well as provide a comparison of how they work in practice using four publicly available datasets. Source code implementing methods reviewed and compared here is available at https://github.com/rohdelab/cell-image-classification.

This paper is organized as follows. In Section 2, different applications of cell image classification are explained. The three main classification methods are briefly reviewed in 3, and more extensively described in Appendix A. The description of experimental (computational) methods is available in section 4 with guidelines for using the source code in Appendix B. The results are presented in section 5 with discussions in 6.

2 Applications

Cellular mechanisms underlie every action and development in the body and the natural world at large. Cellular dynamics, in part, determine the body’s response to drugs and treatment, disease progression, and contain genetic information which can elucidate why abnormalities occur in certain individuals and not others. Because of this, cell image classification has many important applications including drug discovery, digital pathology, genetic screening, and cell biology, among others.

Drug discovery

The drug discovery process involves many steps from identification of targeted symptoms and disease, creation of chemical compounds designed to effect changes in regions causing the underlying disease, and execution of ex-vivo and finally in-vivo experiments. One limiting factor that affects the drug discovery pipeline is the decision of which chemical compounds should be allocated experimental resources. There may be many compounds which have the potential to be effective, but a limitation in resources affects their ability to be tested. Scientists carry out ex-vivo experiments to identify the efficacy of potential drugs, and image classification is one way to evaluate which drugs perform the best.

Cell phenotype classification methods have been used many times to explore mechanisms of action, target efficacy, and toxicity of drugs [4, 36, 5]. For example, Loo et al.[37] profiled responses of different drugs using a supervised cell image classification with support vector machines to separate drug-treated and untreated human cancer cells using 296 predetermined phenotypical features. Ljosa et al.[36] used both supervised and unsupervised classification algorithms with support vector machines and Gaussian mixture models, respectively, to differentiate mode of actions, e.g., Kinase inhibition, DNA replication, cholesterol regulation, etc. of a compendium of drugs e.g., alsterpaullone, camptothecin, and mevinolin, among others. The classification was performed in the space of 453 predetermined image features of breast cancer cells that were extracted using CellProfiler [38].

Through high-throughput experimental methods, scientists are able to process large amounts of data necessary to identify which drugs to test further. Image-based cell classification is one way to test drug efficacy prior to in-vivo experiments, which are both costly and time-consuming. Researchers can treat specific cells with different drugs and examine the drug’s effect by profiling cellular perturbations and comparing across drug treatment conditions [3, 4, 6, 7]. Classification methods allow scientists to discern whether differences in cell phenotypes arise as a result of drug application. In this way, scientists may be able to identify which drugs have the greatest potential to alleviate targeted symptoms and conditions prior to in-vivo experiments. A consequence of this benefit is that through computational resources, many more drugs can reach the ex-vivo stage for assessment prior to selection of which drugs to test in-vivo [8, 9].

Digital pathology

One domain in which cellular classification techniques are already being implemented is diagnosis aided digital pathology. Through the combination of biopsy and imaging modalities, physicians can capture and examine a patient’s cells from various regions of their body. Cellular imaging has become an important diagnostic tool, and advances in imaging techniques have resulted in an abundance of data; clinical decision support systems have arisen as a way to harness that data. Histological specimens are typically analyzed by trained experts, but there is inter-observer variability that can result in varying diagnoses [39]. Computer-aided decision support systems increase the objectivity and reproducibility of diagnoses by relying on consistent algorithmic rules for image-based classification. These algorithms also provide a way to grade and quantitatively assess the progression of a disease, a task which has previously been assessed via qualitative measures. Rojo et al.[40] have identified 31 commercially available digital slide systems, ten of which are equipped with technology to detect abnormalities in histological specimens.

Advances in image classification have led to the first machine learning-based FDA-approved clinical decision support systems [41, 42]. A typical flowchart of steps taken by a clinical decision support system, specifically in relation to histological tasks, include preprocessing, segmentation, feature extraction, dimension reduction, disease detection and classification, and post-processing and assessment [43]. Cell image classification methods thus play important roles in diagnosis, and have increased cellular image understanding as well as boosted predictive analysis technologies [44, 45]. For example, Guillaud et al.[46] proposed a cervical cancer diagnosis procedure employing a supervised classification method using linear discriminant analysis. Numerical features of cellular images was used to classify cervical cancers into different grades: normal cervix, koilocytosis, and three stages of cervical intraepithelial neoplasia. Petushi et al.[47] reported a breast cancer diagnosis technique using a decision tree classifier with textural features of Hematoxylin and Eosin stained histology images to separate invasive breast carcinoma into three histologic grades.

One case for which cellular imaging is used to determine disease state and early intervention is in patients with Barrett’s esophagus. Barrett’s esophagus is a pre-malignant condition that predisposes patients to the development and progression of esophageal adenocarcinoma [48]. The condition can be detected by assessing tissues collected via endoscopic biopsy for levels of intestinal dysplasia [49], but a standard white-light examination of four-quadrant biopsy has been shown to miss neoplasia in 57% of cases [50]. Because morphological changes (i.e. increasing grades of epithelial dysplasia) indicate the carcinogenic progression of Barrett’s mucosa, high-resolution imaging and cellular classification algorithms are important diagnostic tools [51]. In one study, a linear discriminant analysis-based classification algorithm on image features was used to increase diagnostic accuracy to 87% [52].

Genetic screening

Cell image classification techniques have extended the ability of researchers to investigate cellular genetic mechanisms. RNA interference (RNAi) is a term used to describe a mechanism which uses a cell’s own DNA sequence of a gene to silence it, or reduce its functionality, and has emerged as a powerful tool to assess gene function [53]. When the normal function of a gene is required for a given cellular mechanism, knockdown of that gene may lead to a phenotype which is detectable via assay and associated imaging techniques [54]. Genetic screening via RNAi has already led to critical new insights for a number of pathologies and processes including infectious disease, cancer, aging, and drug discovery, among others [55, 56, 57, 58, 59].

RNAi has increasingly been utilized in the context of cultured cells to perform large-scale genomic studies to identify multiple genes in a functional pathway. Because fluorescent microscopy allows one to visualize labeled proteins and their associated changes across conditions, high-throughput image classification tools can play an important role in helping researchers determine gene functionality and uncover previously uncharacterized phenotypes [60, 14, 15]. Supervised classification methods employing support vector machines used with statistical features of segmented cell images, e.g., shape, texture, homogeneity, brightness, etc. was suggested in Conrad et al.[14] for separating different cellular morphologies in some RNAi screening applications. Hierarchical decision tree based classification has been used to separate different small interfering ribonucleic acid phenotypes to understand mechanisms of cell movement [61].

Cell Biology

Cellular level understanding of physiological processes can provide better insights into the mechanisms of tissue and organ systems. The use of computerized algorithms to aid in understanding biological processes is not new. The practice of karyotyping – the process of describing the number and appearance of chromosomes in a eukaryotic cell in order to identify genomic defects – has relied on automated procedures for over fifty years [62, 63]. Understanding the specific structure and characteristics of an organism’s genetic make-up can provide insights into developmental status. Eliminating the human operator allows for not only the acceleration of the tedious process of finding cells in mitosis and arranging the chromosomal images into karyograms but also the quantization and classification of chromosomal features [64]. There are numerous other applications of computerized cell image classification techniques in cell and systems biology.

Cell signaling mechanisms regulate all cellular activities and thus can influence molecular-level physiology. Researchers have used Gaussian mixture models to detect subcellular particles from epifluorescence microscopic images and understand fusion and separation events, as well as signal transduction mechanisms of cell-surface receptors [65]. Additionally, scientists have classified cell images of different treatment conditions to recognize signaling regulation of morphological properties and protrusion and adhesion mechanisms of cells, as well as to monitor the temporal and hierarchical relationships among different signaling networks [66]. In this application, supervised classification algorithms employing both linear discriminant analysis and neural networks were used with 154 numerical features of segmented cell images.

Automated cell image recognition techniques have also been used to discover important processes related to cell division, such as how chromosomes congress or segregate during mitosis [67], the manner in which microtubules are organized in vertebrate meiotic spindles, and the process by which the bipolarity of the spindle is maintained throughout the cell division [68]. Several works have employed computerized image detection techniques to recognize migratory patterns of cells explaining mechanisms of embryonic development, wound healing, and angiogenesis [69, 70, 71].

Boland et al.[27] classified subcellular protein localization patterns by training a neural network used with some predetermined numerical features of segmented HeLa cell images to understand the structures and functions of subcellular proteins. Monitoring the molecular architecture of every cell and analyzing tens of thousands of subcellular biomolecular markers is impractical for human operators [69, 72, 73]. Subcellular protein localization patterns provide information relating to the sequence, structure, and function of those proteins clarifying cell functions under different circumstances [74, 27].

One exciting application of cell image classification is the recognition of biological processes which are invisible to the human eye. Hidden processes can be paired with measurable phenotypes by the use of mathematical models, and these processes can thereby be detected in association with any phenotypical change. This approach was first taken to understand the mechanism of kinetochore positioning in the metaphase spindle of yeast. Despite the unresolvable sizes of kinetochores, their movements were reliably detected in this study [70].

3 Overview of image classification methods

We have grouped the classification methods for distinguishing images of cells into three main categories: numerical feature engineering, neural networks, and transport-based morphometry. Here we provide a brief overview of each category. Rather than being exhaustive in the citation, we instead focus on describing broad trends, while selecting a few exemplary papers in each category to describe more carefully. The discussion in this section is kept at the overview level. Appendix A contains a more detailed explanation, including mathematical descriptions, of the methods.

Numerical feature engineering

Numerical feature engineering (NFE) methods extract pre-determined features from segmented cells and use them to classify images with a statistical regression-based classifier [17, 74]. Cell images can be classified in their raw forms (e.g., pixel values) or can be summarized into some internal representation (or set of features) that may bear better discriminating information. A learning subsystem e.g., a classifier can then learn these features and differentiate images more effectively.

To better understand the concept of feature extraction in image classification, let us refer to Fig. 1(a). The goal here is to classify two groups of cell images. A feature engineering method represents each of the image samples (from both groups) in terms of a predefined feature set. Each point in the scatter plot denotes a feature-space representation of one particular cell image from either of two groups. A classifier can then be trained in this feature space to effectively differentiate two image groups. Appendix A contains a more detailed explanation of numerical feature engineering methods.

Numerical features-based cell fluorescence pattern classification has been used many times before [17, 18, 19, 20, 21, 22] to detect different antinuclear auto-antibodies relevant to systemic autoimmune diseases. Numerical features have been used in other examples [75, 23, 24, 76] to count different types of white blood cells in bone marrow and thereby help in the diagnosis of certain diseases, e.g., leukemia, acquired immunodeficiency syndrome, cancers, etc. Numerical feature-based cell image classification has also been used to understand the sequence, structure, and function of different subcellular proteins [25, 74]; and to differentiate histological images for automatic diagnosis of cancer [26]. Engineers have experimented with a large number of image features: area, perimeter, elongation, convex area, mean, variance, Harlick/Gabor/wavelet textures, Euler numbers, and many more. Ideally, extracted features should be selective of aspects of images that are more relevant to discrimination. But the best discriminating features vary across datasets. Selection of features is an open problem of feature engineering methods.

To evaluate performance of the numerical feature engineering category of methods, we used image features extracted by Wnd-chrm [77]. Performance metrics were obtained via implementation of existing classification algorithms from the scikit-learn package [78]: random forests (RF), k-nearest neighbors (k-NN), linear support vectors machines (SVM-l), logistic regression (LR), linear discriminant analysis (LDA), kernel support vector machines with radial basis function (SVM-k), and penalized linear discriminant analysis (PLDA). Appendix B has more information regarding implementation specifics.

Neural networks

Neural networks (NN) are mathematical representation functions that came about via combination of multiple perceptrons [79]. A neural network-based classification system learns the features from the raw data and classifies the input data based on these features. They have recently gained popularity in image classification for their high accuracy. Here we review some of the main neural network techniques.

A popular neural network model is multilayer perceptron (MLP) [80]. A sequence of non-linear modules builds an MLP. Each module is comprised of a linear transformation unit followed by a nonlinear differentiable activation (generally “ReLU”, “sigmoid”, or “tanh”) unit. Data samples, in their vectorized forms, enter the MLP from the input side and are successively transformed through all layers of MLP modules to generate outputs. Each layer of an MLP module transforms the input to a more abstract and higher-level representation. An interplay of these interconnected transformation layers enables the model to learn very complex functions. Due to considering input image data in vectorized forms, MLPs leave out the spatial coherence information of nearby pixels. Moreover, they are not always invariant to some transformations, e.g., translation, scaling, deformation, etc. that may be irrelevant in the context of some applications.

Convolutional neural network (CNN) is a class of neural networks [81, 82, 83] that is well suited for image data. Spatial coherence and transform invariance are built into the structure of CNNs. They are sensitive to relevant minute variations and invariant to irrelevant large variations, e.g., translation, scaling, etc. by considering local information in images. CNN layers have two different kinds of structures. The first few layers are comprised of a combination of convolutional, non-linear activation, and pooling layers. The remaining layers are denoted as fully connected layers that operate on data in their vectorized forms. A convolutional layer is subdivided into a number of small patches. Each patch extracts features from small sub-regions in images, filters those features, and passes the filtered features to the next layer. Each small patch in a particular layer shares the same sets of filter weights so that the same local pattern is detected at different locations of the image. Multiple features are detected with multiple convolutional layers. The pooling layers merge similar features together and reduce dimensionality to some extent. The lower-order information gathered in convolutional and pooling layers are combined with the higher-order features of fully connected layers later to compile the information about the image as a whole. The final decision layer is generally followed by a “softmax” function [79] that normalizes the outputs into numbers in the interval $(0,1)$ .

A general architecture of a CNN is presented in Fig. 1(b). An input image sample is processed by sub-units in convolutional layers with convolution and nonlinear mapping. The outputs of the convolutional layers are then passed to the pooling layers for semantic merging. The final layers of CNNs consist of fully connected layers. Multiple instances of convolutional, pooling, and fully connected layers are generally present in a standard CNN. A neural network-based classification system outputs a predicted class (e.g., benign, malignant, etc.) for an input image. Appendix A contains a more detailed mathematical description of neural networks.

Neural network-based cell image classification has been used to classify different staining patterns of human epithelial type-2 cells for detecting auto-antibodies of different autoimmune diseases [31, 32, 33], identify different subcellular localization patterns of proteins [27], and diagnose cervical cancer [30]. Neural networks have outperformed many sophisticated artificial intelligence algorithms in the task of image classification but may require a large number of image samples to perform reliably [33, 84].

From the neural networks (NN) category, we compared the performances of a multilayer perceptron (MLP) and few convolutional neural networks (CNN): a shallow CNN implementation with one convolutional layer [85], VGG16 (a deep neural network with 16 convolutional layers) [81], and Inception-V3 (also a deep neural network with 41 convolutional layers) [82, 83]. VGG16 and Inception-V3 were implemented both without (VGG16, INCv3) and with (VGG16-T, INCv3-T) transfer learning using “imagenet” weights [86]. More information regarding implementation specifics can be found in Appendix B.

Transport-based morphometry

Transport-based morphometry (TBM) methods, guided by the mathematics of optimal mass transport, decode differences among images by quantifying the least effort required to morph the images into a reference image [35, 6, 87]. TMB methods transform raw image data into a representation that facilitates both image classification and visualization of biologically interpretable differences between classes. Embodiments of this nonlinear image representation method include Radon cumulative distribution transform [87], continuous linear optimal transport [88], and discrete linear optimal transport [6], among others.

The Wasserstein distance between two images can quantify the optimal transport of mass (image intensities, in the case of an image) required to morph one image into the other [34, 6]. The weighted Euclidean distance between images in transport space is closely related to the Wasserstein distance between them in image space (refer to Basu et al.[6], Kolouri et al.[89] for more details). TBM methods measure linear embeddings for all input images by computing their Wasserstein distances to a pre-computed reference image. This linear embedding produces the transport space representations of all the image samples. After image transformation, TBM employs a statistical regression-based classifier in transport space to separate data classes. One exclusive property of TBM methods is that it is possible to visualize and interpret any regression in transport space and thereby understand biologically interpretable differences across classes.

To understand the TBM mechanism, please refer to Fig. 1(c). Each point in the scatter plot in Fig. 1(c) embodies the transport space representation of an image sample in the subspace of the two most discriminant directions computed by penalized linear discriminant analysis (PLDA). A classifier is trained in the transport space to separate transport space representations of input images into different classes. Moreover, the differences among the classes are visualized by the inverse transform property of TBM. The representative images in the panels along the x and y-axes in Fig. 1(c) show the visualizations of class differences along the first and second most discriminant PLDA directions, respectively. Appendix A contains a more extensive description of TBM methods.

Transport-based morphometry has been used in a number of cell image classification problems: classification of liver hepatoblastoma by analyzing nuclear chromatin distribution [6, 34], classification of subcellular protein localization patterns in HeLa cells [34], detection of follicular adenoma and carcinoma using histological images of thyroid nuclei [90, 6], and automated screening for cell phenotype changes as a result of drug treatment [6], to name a few.

From the transport-based morphometry (TBM) category, we implemented the Radon cumulative distribution transform (R-CDT) [87] coupled with the same statistical regression-based classifiers that were used with the numerical feature engineering category. Appendix B has more information regarding implementation specifics. For both the transport-based morphometry and the numerical feature engineering catagories, we did not implement SVM-k for the human epithelial cell dataset (Hep2) due to the computational complexity arising from a large dataset (63,445 images).

4 Experimental Setup

To test and compare the performances of three broad categories of cell image classification algorithms, we selected methods from a few exemplary papers: Wnd-chrm [77] from numerical feature engineering; multilayer perceptron [80] and three convolutional neural network architectures – an existing shallow CNN [85], VGG16 [81], and Inception-V3 [82, 83] – from neural networks; and the Radon cumulative distribution transform [87] from transport-based morphometry. Details of all these methods with mathematical descriptions are presented in appendix A.

To evaluate the performance of each method, we trained each classification model using a portion (called training set) of a given dataset. The classification performance of the model was computed using the remaining portion (called the validation or test set) of the dataset. To reduce over-fitting or selection bias, the model performance was evaluated $k$ times using $k$ different partitions of training and testing sets via $k$ -fold ( $k=10$ ) cross-validation. All datasets were preprocessed as follows: all the images were centered such that the center of mass of each image occurs at the center of view of each image, oriented such that their major axes are aligned, and flipped such that they have similar intensity weight distribution (see Appendix A for details). Features were scaled to standardize the range of features using the standard scaling function in Python’s scikit-learn package [78]. Principal component analysis was used to reduce data dimensionality.

Datasets

Four different datasets with different classification problems were used in this overview: HeLa cell [74, 27], human osteosarcoma cell [6, 91, 38], thyroid nuclei [6, 90], and human epithelial cell [84] image dataset. We used datasets where the cell images were already segmented. The classification performances of different methods were evaluated on each of these cell image datasets.

HeLa dataset

The HeLa cell image dataset concerns a protein characterization problem in the field of functional genomics or proteomics: detecting subcellular protein localization patterns. The associated classification task is to separate major subcellular protein localization patterns to identify the distribution, and function of expressed proteins. Segmented fluorescence microscopy images of HeLa cells have been collected from Murphy et al.[74] and Boland et al.[27] The HeLa cell image dataset contains 10 subcellular localization patterns of the major organelles: endoplasmic reticulum protein (ER), Golgi protein GPP130 (Gpp), mitochondrial protein (Mit), filamentous actin labeled with rhodamine-phalloidin (Act), cytoskeletal protein tubulin (Tub), DNA labeled with DAPI (Dn), Golgi protein giantin (Gia), lysosomal protein LAMP2 (La), nucleolar protein nucleolin (Nuc), and transferrin receptor (in endosomes) (Tfr) (see Fig. 2).

Human osteosarcoma cell dataset

The Human osteosarcoma cell dataset [6, 91, 38] is used for the task of examining the underlying trend of the cytoplasm-to-nucleus translocation of the forkhead fusion protein (FKHR-EGFP). The aim is to quantify the translocation of FKHR-EGFP with the infusion of Wortmannin dosage in stably transfected human osteosarcoma (U2OS) cells. Localized in the cytoplasm, FKHR usually moves towards the nucleus and then is transported out by export proteins. If this export is inhibited by some drug, e.g., Wortmannin, FKHR starts to accumulate in the nucleus. This accumulation may cause a cell phenotype change due to the export inhibition by Wortmannin. Images of human osteosarcoma cells have been collected from Carpenter et al.[38] and segmented using Basu et al.[6]. In this overview, we have differentiated cells with no drugs (negative control) from cells with 150 nM Wortmannin added (positive control). Images from these two classes are illustrated in Fig. 2.

Thyroid nuclei dataset

The thyroid nuclei dataset is used to distinguish normal, benign, and malignant cell types from nuclear structures of thyroid cells [6, 90]. The task here is to distinguish among follicular adenoma (FA), follicular carcinoma (FTC), and normal (NA) thyroid cells (see Fig. 2) using only the information embedded in chromatin arrangements of the nuclei. FA and FTC are both neoplastic but only FTC is capable of metastases. Due to similarities in the nuclear features, differentiate between FA and FTC is difficult. Physicians have historically depended on histopathology to distinguish thyroid nuclei. After surgical removal, the lesion is examined for the capsular or vascular invasion, the characteristic feature of FTC. Changes in nuclear structures of thyroid cells is visible in microscopic images. Image analysis-based classification techniques can detect these changes. Segmented images of three classes of thyroid nuclei have been collected from Basu et al.[6].

Human epithelial cell dataset

The dataset of human epithelial type-2 cells (Hep2) has been obtained from Qi et al.[84]. The dataset contains a large number (63,445) of segmented cell images from six different classes. This dataset is created by segmenting out single cell images from 948 specimen images of the ICPR 2014 HEp-2 cell classification contest dataset. The classification task in this paper is differentiating various staining patterns in indirect immunofluorescence Hep2 images to indicate different antinuclear auto-antibodies related to autoimmune diseases. The six image classes in this dataset are centromere (CE), Golgi (GO), homogeneous (HO), nucleolar (NUC), nuclear membrane (NUM), and speckled (SP). Representative cell images from each image class are displayed in Fig. 2.

Evaluation Measures

Percentage accuracy

Each classification model was trained with a training set and evaluated with a testing set. We calculated the percentage accuracy of correctly predicting the class labels for test data in a given fold (or partition) of cross-validation. Let, $y_{i}^{(te)}$ be the percentage accuracy for a model for a testing set of the $i$ -th fold. The mean, $\mu_{acc}$ , and standard deviation, $\sigma_{acc}$ , of the classification accuracy were calculated over $k$ –folds of cross-validation as

[TABLE]

Kappa statistic

Cohen’s kappa statistic, $\kappa$ , measures the agreement among different realizations (partitions of cross-validation) of the algorithm by taking into account the agreement occurring by chance. $\kappa$ can be computed as

[TABLE]

where $p_{0}$ is the relative observed accuracy and $p_{e}$ is the hypothetical probability of accuracy by chance (refer to Viera et al.[92] and Landis et al.[93] for more details).

5 Results

In this section, we present the comparative performances of different cell image classification techniques in terms of percentage accuracy and $\kappa$ -statistic values. The classification techniques were divided into three broad categories: numerical feature engineering methods (NFE), neural network-based methods (NN), and transport-based morphometry methods (TBM). The methods were evaluated on four datasets: HeLa cell (HeLa), human osteosarcoma cell (U2OS), thyroid nuclei (ThN), and human epithelial cell (Hep2). A summary of all percentage accuracy and $\kappa$ -statistic results computed in this paper is presented via bar graphs in Fig. 3.

In the HeLa, U2OS, and ThN datasets, the classification accuracies within the NFE category (Wnd-chrm) are, in a majority of cases, higher than or equivalent to classification accuracies within the other two broad categories (Fig. 3). The performance superiority, in terms of classification accuracies, of the NFE-based category of methods for these three datasets is consistent across different classifier implementations, with a few exceptions, e.g., k-NN in HeLa and ThN, RF in HeLa, U2OS, and ThN, etc.

In contrast, for the Hep2 dataset, the classification accuracies of the majority of methods within the NN category are comparatively higher than the accuracies within the NFE and TBM-based categories of methods (Fig. 3). All of the CNN architectures implemented (Shallow, VGG16, VGG16-T, INCv3, INCv3-T) outperform, in terms of classification accuracies, all classifiers in each of the two other broad categories of methods.

The classification accuracies within the TBM category of methods (R-CDT) used in this paper vary across datasets and classifiers. Some classifiers implemented with the R-CDT produce results equivalent to a feature extraction method or a convolutional neural network. For example, in the U2OS dataset, R-CDT with LR (accuracy $=95\pm 3.1$ %, $\kappa=0.89$ ) performs similarly to SVM-k with Wnd-chrm features (accuracy $=96\pm 2.9$ %, $\kappa=0.92$ ), and in the ThN dataset, R-CDT with LR (accuracy $=62\pm 3.9$ %, $\kappa=0.43$ ) performs similarly to INCv3 CNN (accuracy $=62\pm 4.2$ %, $\kappa=0.42$ ) (Fig. 3). In most of the cases, however, the TBM-based category of methods tested in this paper do not outperform NN and NFE-based methods.

The ability of the TBM-based category of methods to visualize differences among the classes of a given dataset is presented in Fig. 4. We have chosen NA and FA classes of ThN dataset for demonstration. The histograms of the projections of R-CDT transformed test images along the most significant direction (computed using PLDA on training images) of differences between NA and FA cells are presented in the top panel, and the modes of variations of the images along the same discriminant direction is illustrated in the bottom panel in Fig. 4. The representative images at a particular coordinate along the discriminant direction correspond to the histograms at the same coordinate in the top panel. Important differences between the normal (NA) and follicular adenoma (FA) of the thyroid are evident in Fig. 4. It can be seen that peripheral chromatin concentration increases as the cell progresses from normal (NA) to follicular adenoma (FA) of the thyroid. The presence of peripheral rings and the decline of homogeneity of chromatin concentration are revealed by the TBM mechanism to be the possible underlying differences between follicular adenoma (FA) and normal (NA) thyroid tissue.

To understand the impact of the dataset size on the classification accuracy of neural networks, we measured the percentage accuracy of VGG16-CNN for different training dataset sizes. To that end, we selected the dataset with the largest number of data samples (Hep2 – 63,445 images). We sub-sampled Hep2 dataset into sub-datasets with training dataset sizes ranging from $2000$ to $55,000$ and evaluated the performance of VGG16-CNN on each subset with $10$ -fold cross-validation. In this experiment, the percentage accuracy of VGG16-CNN increased as the number of training images increased (Fig. 5).

The values of the standard deviation of the accuracy for all the methods are in an acceptable range in all four datasets. This attests to the reliability of the estimates. Also, the Cohen’s $\kappa$ statistic, that measures of agreements among different realizations, is $>0.6$ in most of the cases (indicating substantial or almost perfect agreement according to Landis et al.[93]), in the range $(0.2,~{}0.4)$ in many cases (indicating fair or moderate agreement according to Landis et al.[93]), and $<0.2$ in few cases (indicating slight agreement according to Landis et al.[93]). The substantial or fair agreements as quantified by the Cohen’s $\kappa$ statistic in most of the cases indicate equivalent partitions among data classes and pronounces robustness of the evaluation measures.

6 Discussions and Conclusion

This paper has presented a comparative overview of different cell image classification techniques. Currently available methods for cell image classification were divided into three basic categories: numerical feature engineering (Wnd-chrm), neural networks (MLP, CNN), and transport-based morphometry (R-CDT). The methods were tested and compared in a standardized manner on four different cell image datasets with different classification problems: subcellular protein localization from microscopy images [74, 27], the quantification of the FKHR-EGFP translocation variation due to the addition of a drug [6, 91, 38], differentiation between benign and malignant thyroid nuclei [6, 90], and classification of human epithelial datasets based on cell staining patterns [84].

Which method does have the best accuracy?

The percentage accuracy of the numerical feature-based method, Wnd-chrm is the best or near the best in all four datasets. Wnd-chrm constructs a comprehensive feature vector with a large set of empirical representations (1025 features) of each image [77]. Though the extracted features, as well as their linear or nonlinear combinations used in classifiers, lack intuitive explanation regarding the underlying biological mechanisms, for the problems we tested, they exhibit high effectiveness in the context of cell image classification. Among the statistical regression-based classifiers that have been used with the Wnd-chrm features, the percentage accuracies of LR, LDA, SVM-k, and PLDA are consistently higher across all datasets.

As far as the neural networks category of methods, we compared two kinds: multi-layer perceptron (MLP), and convolutional neural networks (CNN). The MLP method we tried consistently underperformed other methods. The CNN methods we tried produced state of the art results only in classifying different staining patterns in Hep2 cell images but not in other classification problems. Unlike other datasets, the Hep2 cell image dataset contains a large number of image samples – 63,445 images (approximately 10,570 images per class). Other datasets do not have such a big number of image samples: HeLa dataset contains 862 images (approximately 86 images per class), U2OS dataset contains 492 images (approximately 246 images per class), and ThN dataset contains 2053 images (approximately 684 images per class). Thus, we hypothesize that a large number of training samples may be necessary for neural networks to perform better.

The transport-based morphometry method, using the R-CDT, underperformed other methods in terms of percentage accuracy. However, R-CDT can potentially help to explain biological processes by providing visualizable models for mass distribution in cell images (see Fig. 4) and thus elucidate underlying cell mechanisms and provide with the opportunity for hypothesis generation.

How much data is enough for CNNs?

CNN-based methods have been used in many cell biology and digital pathology applications [84, 31, 32, 33, 27] where dataset sizes range from $\sim 800$ to $\sim 6\times 10^{4}$ . To understand the impact of dataset sizes on the performance of neural networks, we sub-sampled the largest dataset, Hep2 (63,445 image samples) into few smaller datasets (with smaller training and testing datasets) and evaluated the performance of VGG16-CNN on each of them with a $10$ -fold cross-validation. We did not conduct this experiment on other datasets because they do not contain such a large number of image samples as the Hep2 dataset. The plot of the percentage accuracy of VGG16-CNN in Hep2 dataset for different training dataset sizes is illustrated in Fig. 5. It can be seen that the VGG16-CNN outperforms Wnd-chrm with k-NN (accuracy = $79.39\pm 0.60\%$ ) when the training dataset size $\geq\sim 7,000$ . However, this lower bound cannot be used as a guideline for training dataset sizes for CNNs as their performances depend on both the number of training samples and the complexity of the problem.

Different applications in cell biology and medicine employ different numbers of cell images. The size of the datasets in some drug discovery applications [91, 38, 3] ranges from $\sim 500$ to $\sim 7\times 10^{7}$ . The number of data samples varies from $\sim 250$ to $\sim 6\times 10^{4}$ in many digital pathology applications [17, 44, 84, 75, 23, 76]. Some cell biology applications [74, 27] have dataset sizes of the order $\sim 800$ . Given the perceived lack of theory about CNNs, we cannot recommend a specific number of training images for CNNs for a specific application. But as is apparent from the experiment in Fig. 5, greater numbers of training images lead to better CNN performance, as measured by classification accuracy.

Does CNN performance depend on architectures?

In order to assess the performance of CNN applications to cellular classification tasks, we utilized two different frameworks, VGG-16 [81], developed in 2014, and Inception-V3 [82, 83], developed in 2016. Based on some experiments [82], it may be reasonable to expect that Inception-V3 would outperform VGG-16 for the same task. Instead, what we observe is, in some cases, the classification accuracy of Inception-V3 is lower compared with that of VGG-16 (e.g., ThN, Hep2 datasets). In addition to evaluating two frameworks, we investigated whether transfer learning [94], a technique widely adopted in some problems [95, 96, 97, 98, 99], would improve classification accuracies as well. In our classification problems, transfer learning improved the performance of CNNs to some extent, especially for datasets with a relatively lower number of samples (e.g., HeLa, ThN), but still did not outperform numerical feature engineering methods. Overall, our experimental results suggest that the choice of architecture may not significantly alter classification results.

Which method is the easiest to use?

A practical concern for researchers looking to perform cell image classification is which available method has the greatest ease of use. While considerations such as data size/type and theoretical reasoning are critical concerns, research is frequently constrained by realistic limitations such as time, computing power, and user expertise.

One important aspect to consider when choosing among various cell classification methods might be the time available for computation. Time concerns may depend on the type of question being asked and the associated urgency with which an answer is required. For example, in clinical use, determining whether cells from a tissue sample are benign or malignant is time-sensitive due to the role that the outcome plays in a patient’s course of treatment. In this case, a method with a computation time of multiple days may not be suitable. For other questions, however, such as general scientific inquiries, methods which require more time may be acceptable. In a similar vein, computation time is often associated with the amount of training data available. As noted previously, CNNs perform better as training data set size increases, but this increase in accuracy comes with an increase in computational time as well. The trade-off between incremental increases in accuracy and a rise in computation time should be assessed on a case-by-case basis.

In addition to time constraints, computational resources affect which classification methods may be used. Wnd-chrm and R-CDT calculations can be performed on low-level machines whereas CNN calculations require much higher computing power. This additional requirement may limit when CNNs can be executed.

An additional point of consideration when determining which classification method is most suitable for an application is the domain-specific expertise required to execute each method. Executing both feature extraction via Wnd-chrm and the R-CDT computation are a simple matter of inputting the correct dataset path and running the code, with no intermediate steps. The straightforward nature of these methods makes them suitable for settings in which individuals have little to no prior experience with classification techniques. CNNs, on the other hand, require additional steps beyond identifying the dataset path. Manual tuning of the batch size, learning rate, and step-size is a necessary step to achieving optimal accuracy. Adjusting these parameters appropriately requires some knowledge of both general coding techniques and machine learning theory, making this method suitable for use only in environments in which an individual has prior knowledge and experience with CNNs.

Which method should I use for my problem?

Beyond time, computing power, and expertise constraints, another important factor to consider in cell classification method selection is which method is best suited for the nature of the problem. If we are interested in understanding the underlying cell biology, the transport-based morphometry methods can provide an visual representation of any regression methods. If we are interested primarily in achieving high classification accuracy, either feature engineering or neural network-based approaches are the preferred methods. In terms of classification accuracy, the numerical feature engineering methods may be suitable choices for cell image classification problems because they have the best or near the best accuracies in the datasets we tested. Neural network-based methods may be feasible when the number of training samples is very high (approximately on the order of 7,000 images or more for Hep2 dataset). In many cell image classification applications, the number of images available may be low, and, therefore, neural network-based methods may not be optimal. Because selecting a method for cell image classification tasks is problem-specific, we have provided a Python code that implements each method.

Acknowledgements

This work was supported in part by National Institutes of Health awards GM130825 and GM090033.

Source code

Source code is available at https://github.com/rohdelab/cell-image-classification.

Appendix A. Mathematical Details of Image Classification Methods

Image Preprocessing

All images underwent several preprocessing steps. We first converted all of the images to grayscale. Because we were interested in the class differences that were independent of rigid body transformations, e.g., rotations, translations, reflections, etc., we initialized the images after minimizing the following functional:

[TABLE]

where, $I_{m}^{o}(\mathbf{x})$ is the vectorized form of the $m$ -th sample of original raw images, $\mathbf{A}_{m}$ is a matrix parameterized by rotation and isotropic scaling, and $\mathbf{r}_{m}$ is the translation vector. The minimization of such functionals is computationally intensive. Therefore, to yield the preprocessed image, $I_{m}^{p}(\mathbf{x})$ , the following approximation was applied to the original raw image, $I_{m}^{o}(\mathbf{x})$ , as in Rohde et al.[100]: the center of mass of each image was translated to the center of view of each image, the principal axis of each image was aligned to a predetermined angle, and the images were flipped to have similar intensity weight distribution by switching co-ordinates until the functional was minimized.

Numerical feature engineering – Wnd-chrm

In numerical feature engineering, predetermined features are extracted from cell images, and those features are used to classify cells. For example, a vector of extracted features can be formulated for a particular image as

[TABLE]

The process of feature vector representation is repeated for all image samples. Once feature extraction is completed for all samples, a classifier is trained in this feature vector space to classify cell images. We selected Wnd-chrm [77] as the feature extraction program from the numerical feature engineering category for cell image classification. Although Wnd-chrm has built-in methods for classification, we used Wnd-chrm exclusively as a feature extraction method. Using Wnd-chrm features, the feature vector, $I_{m}^{f}(\mathbf{x})$ , for the $m$ -th image sample was constructed from the corresponding preprocessed image, $I^{p}_{m}(\mathbf{x})$ . Wnd-chrm extracts a large set of 1025 image descriptors including high contrast features (e.g., Gabor textures, edge statistics, object statistics, etc), polynomial decompositions (Chebyshev statistics, Zernike polynomials, etc), and statistics and textural information (first four moments, multiscale histogram, Haralick textures, Tamura textures, Radon transform statistics, etc) [77]. These features are calculated on raw pixel images and transforms of the images (wavelets, Fourier, Chebyshev transforms), as well as the transforms of the image transforms. For more details of Wnd-chrm-based feature extraction, refer to Orlov et al.[77].

Neural networks – MLP, CNNs

Multilayer perceptron (MLP), a class of feedforward neural network architecture, is given as a series of nonlinear modules containing a linear transformation unit followed by a non-linear activation unit. The MLP-based classification model for a $P$ -class classification problem is

[TABLE]

where $\mathbf{v}$ is the input image in the vectorized form, $\Theta$ is the weight matrix, and $\sigma:\mathbb{R}^{n}\to\mathbb{R}^{n}$ is an activation function. OTN-P denotes the P-th output node of MLP that indicates the predicted class for an input image. In our work, the network parameters were trained with all the preprocessed training images, i.e., $\mathbf{v}=$ the vectorized form of the $m$ -th preprocessed training image, $I_{m}^{p}(\mathbf{x}),~{}\forall m$ . The gradient of the energy function was computed using the backpropagation algorithm [101].

Convolutional neural networks (CNN) have different structures in the initial modules (or layers). The first few layers of a CNN are comprised of convolutional, non-linear activation, and pooling layers. A convolutional layer extracts and filters features from small sub-regions in images and passes the filtered features to the next layer. The sub-regions in a particular layer share the same sets of filter weights. The pooling layers merge semantically similar features together. The rest of the layers are fully connected and operate on data in their vectorized forms.

Transport-based mophometry – R-CDT

From the transport-based morphometry category, a nonlinear and invertible image transform method known as the Radon cumulative distribution transform (R-CDT) [87] was selected. We selected R-CDT because of requiring less computation and producing closed form solutions without resorting to any kind of optimization algorithm. Before a classifier was run, each preprocessed image, $I^{p}_{m}(x,y)$ , was converted to its corresponding R-CDT representation. The procedure for calculating the R-CDT of an image had several steps. A template image, $I^{p}_{0}(x,y)$ , was computed as the mean of all of the $M$ preprocessed images, $I^{p}_{m}(x,y),~{}\forall m\in[M]$ . The image samples and the template image were normalized such that

[TABLE]

After normalization, $I_{m}^{p}$ and $I_{0}^{p}$ can be treated as probability density functions. Next, sinograms were obtained from the Radon transforms of the template and all the sample images as follows:

[TABLE]

where

[TABLE]

Provided that $\mu_{m}$ and $\sigma$ are continuous probability measures on $\mathbb{R}^{2}$ with corresponding probability density functions $I_{m}^{p}$ and $I^{p}_{0}$ , for a fixed angle $\theta$ , there exists a unique one-dimensional measure preserving map, $f_{m}(.,\theta)$ that warps $\hat{I}^{p}_{m}(.,\theta)$ into $\hat{I}_{0}(.,\theta)$ satisfying the following:

[TABLE]

The forward R-CDT for a sample image, $I_{m}(x,y)$ , is then defined as

[TABLE]

where, $id:\mathbb{R}\rightarrow\mathbb{R}$ is the identity function, $id(x)=x$ . After R-CDT computations were complete for all images, a classifier was trained in transform space to separate the cell images.

Statistical regression-based classifiers

Statistical regression-based classifiers estimate linear or non-linear decision boundaries in the feature or transport space to classify images. In this paper, a number of classifiers were tested: linear discriminant analysis (LDA), penalized linear discriminant analysis (PLDA), support vector machine (with both a linear and radial basis function kernel: SVM-l and SVM-k, respectively), logistic regression (LR), random forests (RF), and k-nearest neighbors (k-NN). Most of the classifiers were used with the default settings of the scikit-learn [78] package in Python.

The vectorized forms of the R-CDT transform domain images, $I_{m}^{t}$ , and the Wnd-chrm feature images, $I_{m}^{f}$ (let us drop the indexing with ‘ $\mathbf{x}$ ’ or ‘ $(x,y)$ ’), were analyzed by statistical regression-based classifiers. We denote both $I_{m}^{t}$ and $I_{m}^{f}$ as $I_{m}$ in this section. The dataset was split into training and testing sets, $I_{m}^{(tr)}$ and $I_{m}^{(te)}$ , respectively, by stratified 10-fold cross-validation. The classifiers were trained on the training data and their performances were evaluated using the testing data. A brief description of the classifiers used are presented in the following:

Linear Discriminant Analysis (LDA)

Linear discriminant analysis classifier differentiates data classes with a linear combination of features based on Fisher’s linear discriminant [102, 103]. LDA classifiers estimate a linear decision boundary by maximizing the following objective function with respect to the matrix of discriminant directions:

[TABLE]

where, $\mathbf{S}_{b}=\sum_{i=1}^{c}N_{i}^{(tr)}(\mu_{i}-\mu)(\mu_{i}-\mu)^{T}$ with $\mu=\frac{1}{N^{(tr)}}\sum_{\forall m\in[N^{(tr)}]}I_{m}^{(tr)}$ , $\mu_{i}=\frac{1}{N^{(tr)}_{i}}\sum_{I_{m}^{(tr)}\in\mathbf{w}_{i}}I_{m}^{(tr)}$ , $N^{(tr)}$ = number of total training data samples, $N^{(tr)}_{i}$ = number of training data samples in the $i$ -th class, $\mathbf{S}_{w}=\sum_{i=1}^{c}\sum_{I_{m}^{(tr)}\in\mathbf{w}_{i}}(I_{m}^{(tr)}-\mu_{i})(I^{(tr)}_{m}-\mu_{i})^{T}$ , $c$ is the number of linear discriminant directions, and $\mathbf{W}$ is a matrix formed by concatenating the discriminant directions, $\mathbf{w}_{i},~{}i\in[c]$ , in its columns. In addition to Fisher’s linear discriminant analysis (LDA) classifier, we also used a penalized version of linear discriminant analysis (PLDA) classifier [104].

Support Vector Machine (SVM)

Support vector machine or the maximum margin classifier [105] aims to separate data classes with a hyperplane such that the separating margin between different classes is maximized. The hyperplane is estimated by minimizing the following constrained objective function:

[TABLE]

where $\mathbf{w}$ is the SVM hyperplane. A nonlinear version of SVM can also be constructed by incorporating a kernel function in the dual formulation of the SVM optimization problem [106]. We implemented both the linear (SVM-l) and the kernel (SVM-k) versions in this paper. The SVM classifiers were run with the default settings in the scikit-learn package [78].

Logistic Regression (LR)

The logistic regression classifier [107] – a form of binomial regression – uses a logistic or “sigmoid” function to identify the differences in data patterns. LR classifiers estimate a linear decision boundary by minimizing the following negative log-likelihood function:

[TABLE]

where

[TABLE]

and $\mathbf{w}$ is the linear decision boundary of the LR classifier. The default settings in the scikit-learn package [78] were used for the LR classifier.

Random Forests (RF)

The random forests classifier [108], an ensemble learning algorithm, constructs multiple decision trees and classifies based on the mean or the mode of the prediction results of individual trees. A random forests classifier repeatedly selects random subsets of data with replacement from the training samples, $I_{m}^{(tr)}$ , and trains a decision tree classifier on each of the random subsets. After training, the prediction for a testing data sample, $I_{m}^{(te)}$ , can be made either by averaging or by taking the majority vote of the predictions from all the individual decision trees. We implemented RF classifiers with the default settings in the scikit-learn package [78].

$k$ –Nearest Neighbors (k-NN)

The $k$ –nearest neighbors classifier assigns classes to an unknown data point (cell images, in our case) by taking majority votes of the $k$ nearest training data points around it. For an unknown data point, $I_{m}^{(te)}\in\mathbb{R}^{d}$ , the classifier forms a set $\mathcal{A}$ with the $k$ nearest (in terms of Euclidean distance) training points. Then the conditional probability of the data point $I_{m}^{(te)}$ to belong to the class $j\in\{0,1,\cdots,n\}$ is estimated as

[TABLE]

where $J(x)$ is the indicator function which is $1$ if $x$ is $true$ , and [math] otherwise. After the conditional probability has been estimated, $I_{m}^{(te)}$ is assigned to the class with the largest probability.In our paper, the k-NN classifier was used with the default settings in the scikit-learn package [78].

Appendix B. Cell Image Classification Software

We have provided a python code for implementing all the cell image classification methods reviewed in this paper. The source code for our experiments is available at https://github.com/rohdelab/cell-image-classification. The usage instructions for the software are as follows:

What packages have to be installled

To use the software, first install the following dependences: Python 3.6, Tensorflow 1.13.1, scikit-learn 0.18.1, wnd-charm and its Python API (https://github.com/wnd-charm/wnd-charm), and the Python optimal transport library (https://github.com/LiamCattell/optimaltransport).

How data should be organized

To run the experiments on a cell image dataset, first create a directory under the ‘data’ directory of the downloaded folder. Then place the segmented and preprocessed images of different classes in different sub-directories of the newly created directory. As an example, the HeLa cell image dataset is provided with the Python software following the required organization.

What commands have to be run

After dependencies are installed and data are placed in correct directories, we can run various classification methods in command prompt. For example, to run logistic regression using Wnd-chrm features run the following command in the top directory containing the downloaded code: “python main.py - - dataset example_directory - - space wndchrm - - model LR”. The options for ‘- - space’ include ‘image’, ‘wndchrm’, and ‘RCDT’, while for ‘- - model’ include ‘RF’, ‘KNN’, ‘SVM’, ‘LR’, ‘LDA’, ‘PLDA’, ‘MLP’, ‘ShallowCNN’, ‘VGG16’, and ‘InceptionV3’. For detailed instructions please refer to the README file included in the github repository.

Bibliography108

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Robert Hooke. Micrographia: Or Some Physiological Descriptions of Minute Bodies Made by Magnifying Glasses. With Observations and Inquiries Thereupon. By R. Hooke, Fellow of the Royal Society . Jo. Martyn, and Ja. Allestry, printers to the Royal Society, 1961.
2[2] Paolo Mazzarello. A unifying concept: the history of cell theory. Nature cell biology , 1(1):E 13, 1999.
3[3] Zachary E Perlman, Michael D Slack, Yan Feng, Timothy J Mitchison, Lani F Wu, and Steven J Altschuler. Multidimensional drug profiling by automated microscopy. Science , 306(5699):1194–1198, 2004.
4[4] Christian Scheeder, Florian Heigwer, and Michael Boutros. Machine learning and image-based profiling in drug discovery. Current Opinion in Systems Biology , 2018.
5[5] Jinghai J. Xu, Peter V. Henstock, Margaret C. Dunn, Arthur R. Smith, Jeffrey R. Chabot, and David de Graaf. Cellular imaging predictions of clinical drug-induced liver injury. Toxicological Sciences , 105(1):97–105, 2008.
6[6] Saurav Basu, Soheil Kolouri, and Gustavo K Rohde. Detecting and visualizing cell phenotype differences from microscopy images using transport-based morphometry. Proceedings of the National Academy of Sciences , 111(9):3448–3453, 2014.
7[7] Shinsuke Ohnuki, Satomi Oka, Satoru Nogami, and Yoshikazu Ohya. High-Content, Image-Based Screening for Drug Targets in Yeast. PLOS ONE , 5(4):1–11, 04 2010.
8[8] Juan C Caicedo, Sam Cooper, Florian Heigwer, Scott Warchal, Peng Qiu, Csaba Molnar, Aliaksei S Vasilevich, Joseph D Barry, Harmanjit Singh Bansal, Oren Kraus, Mathias Wawer, Lassi Paavolainen, Markus D Herrmann, Mohammad Rohban, Jane Hung, Holger Hennig, John Concannon, Ian Smith, Paul A Clemons, Shantanu Singh, Paul Rees, Peter Horvath, Roger G Linington, and Anne E Carpenter. Data-analysis strategies for image-based cell profiling. Nature Methods , 14:849, aug 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Cell image classification: a comparative overview

Abstract

1 Introduction

2 Applications

Drug discovery

Digital pathology

Genetic screening

Cell Biology

3 Overview of image classification methods

Numerical feature engineering

Neural networks

Transport-based morphometry

4 Experimental Setup

Datasets

HeLa dataset

Human osteosarcoma cell dataset

Thyroid nuclei dataset

Human epithelial cell dataset

Evaluation Measures

Percentage accuracy

Kappa statistic

5 Results

6 Discussions and Conclusion

Which method does have the best accuracy?

How much data is enough for CNNs?

Does CNN performance depend on architectures?

Which method is the easiest to use?

Which method should I use for my problem?

Acknowledgements

Source code

Appendix A. Mathematical Details of Image Classification Methods

Image Preprocessing

Numerical feature engineering – Wnd-chrm

Neural networks – MLP, CNNs

Transport-based mophometry – R-CDT

Statistical regression-based classifiers

Linear Discriminant Analysis (LDA)

Support Vector Machine (SVM)

Logistic Regression (LR)

Random Forests (RF)

kkk–Nearest Neighbors (k-NN)

Appendix B. Cell Image Classification Software

What packages have to be installled

How data should be organized

What commands have to be run

$k$ –Nearest Neighbors (k-NN)