Rotation Invariant Descriptors for Galaxy Morphological Classification
Hubert Cecotti

TL;DR
This paper evaluates various rotation-invariant descriptors for galaxy morphology classification, demonstrating a robust approach that achieves high accuracy in distinguishing elliptical and spiral galaxies using machine learning.
Contribution
The study compares multiple rotation-invariant descriptors and introduces a framework that achieves near-perfect classification accuracy for galaxy types.
Findings
High classification accuracy with AUC of 99.54%
Robustness of descriptors against noise and transformations
Effective binary classification of galaxy morphologies
Abstract
The detection of objects that are multi-oriented is a difficult pattern recognition problem. In this paper, we propose to evaluate the performance of different families of descriptors for the classification of galaxy morphologies. We investigate the performance of the Hu moments, Flusser moments, Zernike moments, Fourier-Mellin moments, and ring projection techniques based on 1D moment and the Fourier transform. We consider two main datasets for the performance evaluation. The first dataset is an artificial dataset based on representative templates from 11 types of galaxies, which are evaluated with different transformations (noise, smoothing), alone or combined. The evaluation is based on image retrieval performance to estimate the robustness of the rotation invariant descriptors with this type of images. The second dataset is composed of real images extracted from the Galaxy Zoo 2…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18| Dataset | Features | 11 classes | 5 classes | 3 classes | ||
|---|---|---|---|---|---|---|
| (K=V=5) | ||||||
| (K=V=7) | ||||||
| (K=V=9) | ||||||
| (K=V=5) | ||||||
| (K=V=7) | ||||||
| (K=V=9) | ||||||
| (K=V=5) | ||||||
| (K=V=7) | ||||||
| (K=V=9) | ||||||
| (K=V=5) | ||||||
| (K=V=7) | ||||||
| (K=V=9) | ||||||
| Dataset | Features | 11 classes | 5 classes | 3 classes | ||
|---|---|---|---|---|---|---|
| (K=V=5) | ||||||
| (K=V=7) | ||||||
| (K=V=9) | ||||||
| (K=V=5) | ||||||
| (K=V=7) | ||||||
| (K=V=9) | ||||||
| (K=V=5) | ||||||
| (K=V=7) | ||||||
| (K=V=9) | ||||||
| (K=V=5) | ||||||
| (K=V=7) | ||||||
| (K=V=9) | ||||||
| Dataset | Features | 11 classes | 5 classes | 3 classes | ||
|---|---|---|---|---|---|---|
| (K=V=5) | ||||||
| (K=V=7) | ||||||
| (K=V=9) | ||||||
| (K=V=5) | ||||||
| (K=V=7) | ||||||
| (K=V=9) | ||||||
| (K=V=5) | ||||||
| (K=V=7) | ||||||
| (K=V=9) | ||||||
| (K=V=5) | ||||||
| (K=V=7) | ||||||
| (K=V=9) | ||||||
| Classifier | Features | AUC | f-score | TPR | FPR | FNR | TNR |
|---|---|---|---|---|---|---|---|
| SVM | (K=V=5) | ||||||
| (K=V=7) | |||||||
| (K=V=9) | |||||||
| (K=V=5) | |||||||
| (K=V=7) | |||||||
| (K=V=9) | |||||||
| ELM | (K=V=5) | ||||||
| (K=V=7) | |||||||
| (K=V=9) | |||||||
| (K=V=5) | |||||||
| (K=V=7) | |||||||
| (K=V=9) |
| Classifier | Features | AUC | f-score | TPR | FPR | FNR | TNR |
|---|---|---|---|---|---|---|---|
| BLDA | (K=V=5) | ||||||
| (K=V=7) | |||||||
| (K=V=9) | |||||||
| (K=V=5) | |||||||
| (K=V=7) | |||||||
| (K=V=9) | |||||||
| stepLDA | (K=V=5) | ||||||
| (K=V=7) | |||||||
| (K=V=9) | |||||||
| (K=V=5) | |||||||
| (K=V=7) | |||||||
| (K=V=9) |
| Architecture | Data augmentation | AUC | f-score | TPR | FPR | FNR | TNR |
|---|---|---|---|---|---|---|---|
| A1 | No | 86.87 | 92.40 | 35.57 | 7.60 | 64.43 | |
| A2 | No | 90.38 | 91.81 | 19.89 | 8.18 | 80.11 | |
| A3 | No | 91.76 | 92.21 | 15.34 | 7.79 | 84.66 | |
| A4 | No | 92.89 | 91.62 | 9.89 | 8.38 | 90.11 | |
| A1 | Yes | 90.16 | 92.27 | 21.70 | 7.73 | 78.29 | |
| A2 | Yes | 93.78 | 94.54 | 12.39 | 5.45 | 87.61 | |
| A3 | Yes | 95.54 | 96.10 | 8.86 | 3.89 | 91.14 | |
| A4 | Yes | 96.47 | 95.97 | 5.23 | 4.03 | 94.77 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Rotation Invariant Descriptors for Galaxy Morphological Classification
Hubert Cecotti H. Cecotti is with the Department of Computer Science, College of Science and Mathematics, Fresno State University, Fresno, Ca, USA.
Abstract
The detection of objects that are multi-oriented is a difficult pattern recognition problem. In this paper, we propose to evaluate the performance of different families of descriptors for the classification of galaxy morphologies. We investigate the performance of the Hu moments, Flusser moments, Zernike moments, Fourier-Mellin moments, and ring projection techniques based on 1D moment and the Fourier transform. We consider two main datasets for the performance evaluation. The first dataset is an artificial dataset based on representative templates from 11 types of galaxies, which are evaluated with different transformations (noise, smoothing), alone or combined. The evaluation is based on image retrieval performance to estimate the robustness of the rotation invariant descriptors with this type of images. The second dataset is composed of real images extracted from the Galaxy Zoo 2 project. The binary classification of elliptical and spiral galaxies is achieved with pre-processing steps including morphological filtering and a Laplacian pyramid. For the binary classification, we compare the different set of features with Support Vector Machines, Extreme Learning Machine, and different types of linear discriminant analysis techniques. The results support the conclusion that the proposed framework for the binary classification of elliptical and spiral galaxies provides an area under the receiver operating characteristic curve reaching 99.54%, proving the robustness of the approach for helping astronomers to study galaxies.
Index Terms:
rotation invariant, moment, galaxy morphologies, classification, image processing, pattern recognition
I Introduction
Pattern recognition and machine learning techniques are now reaching the stars through the use of advanced techniques to classify galaxy morphologies [1]. Given the different shapes and orientations of the galaxies, robust descriptors are required to classify them. In particular, these descriptors should be translation, scale, and rotation invariant when applied in large collection of images. The latter characteristic has been a key problem since the early days of pattern recognition. In this study, we propose to analyze the performance of rotation invariant techniques to classify different morphologies of galaxies. The morphological classification of galaxies has been typically done visually as this difficult task requires some prior experience and knowledge with the type of images to analyze. In the last years, some projects such as the series of Galaxy Zoo projects [2, 3], have significantly enhanced the classification of galaxy morphologies. These projects have shown that large datasets of galaxy images can be analyzed by non-scientist volunteers, and combining the analysis across multiple participants can provide some reliable measurements despite the inner subjective evaluations. However, the substantial increase of images obtained from telescopes cannot be matched by the distributed manual efforts. It is therefore necessary to provide techniques based on machine learning and image processing to classify images of galaxies [4]. Furthermore, classifiers can be directly used as a means to rank human performance for labeling as the performance is directly linked to the level of noise in the data [5]. For instance, an average number of 44 users analyzed each galaxy to determine its shape (e.g., smooth and rounded or not). A key issue related to manual labeling is the level of agreement among the different participants. With only about 8% of the galaxies that were classified when the agreement among the voters was superior or equal to 95%, it is impossible to estimate with confidence all the remaining galaxies [3]. These results illustrate both the difficulty of the task and the need of more reliable methods.
Galaxies can be divided into three main classes corresponding to their shape: spirals (disk dominated shape) (S), elliptical (spheroidal-looking) (E), and irregulars (I). The description of the shapes started in the 19th century with the notion of spirals from William Parsons, 3rd Earl of Rosse. This classification system is called the Hubble sequence [6]. The morphology of galaxies encodes information related to the orbital parameters, and assembly history of galaxies can be decoded through the analysis of their morphology. Their content includes gas, dust, stars, and the central black hole. In addition, the morphology is closely related to the local environment of the galaxy because mutual interactions like tides, shocks in cluster environments, and direct mergers can all modify the shape of the galaxy’s gravitational potential [7, 8]. The tuning fork diagram is an arrangement of galaxies according to their rotation. In this diagram, it starts with elliptical galaxies (E) then forks into two types of spirals: with (SB) and without (S) a central bar-shaped structure (see Fig. 1). Elliptical galaxies can be subsequently decomposed in relation to their degree of ellipticity in the sky. corresponds to an elleptical galaxy with for an ellipse with semi-major and semi-minor axes of lengths and , respectively. In the subsequence sections, we will consider E0, E3, and E7. Spiral galaxies are typically described as a flat rotating disk, which contains stars, gas, dust, and a central concentration of stars. They can be decomposed in relation to the tightness of their spiral arms. A lower-case letter is added to the name of the class to determine the spiral structure appearance, e.g., Sa/SBa for tightly wound, smooth arms; large, bright central bulge; Sb/SBb for less tightly wound spiral arms than Sa/SBa; Sc/SBc for loosely wound spiral arms. In 1936, Hubble revised his classification system to include a fourth major galaxy class: S0 (lenticular) galaxies that were armless disk galaxies. They represent the transition from ellipticals to fully developed spirals. This type remains a research question due to its relationship to spiral and elliptical types. The S0 class is between E and S in the sequence. The Hubble sequence has been extended through the de Vaucouleurs system, which is a finer description of the galaxy morphologies, introducing features such as the presence of a nuclear bar [9].
The goal of this paper is multifold: 1) to provide a comprehensive description of current state of the art rotation invariant descriptors, 2) to analyze the performance of these different sets of descriptors on multiple datasets (artificial and real) with different level of noise, corresponding the galaxy morphologies, 3) to propose a framework for the classification of galaxies using rotation invariant descriptors. The remainder of the paper is organized as follows. First, related works and the various techniques to extract rotation invariant features are presented in Sections II and III. The artificial datasets, the real images, and the proposed preprocessing steps for denoising the images are detailed in Section IV. The performance of the different approaches are then presented in Section V. Finally, the impact of the results are discussed in Section VI.
II Related works
The classification of galaxy morphologies is a difficult and subjective task that requires an expert or a committee of experts to label images such as what was used in the Galaxy Zoo projects. It is difficult as the details that can be observed separating two morphologies can be subtle, requiring the eye of an experienced observer to identify faint objects. Hence, the creation of the ground truth is subjective, leading to different observers assigning galaxies to different classes. In addition, it is a challenging task at multiple levels for the creation of a training dataset. First, the images must be segmented in relation to the apparent radius of the object. For instance, the radius of the object can be estimated through the Petrosian radius, which is a distance-independent measurements of galaxy profiles [10]. A galaxy can be described through its Sérsic [11] or Jaffe [12] profile to describe the light distribution and have been used to determine galaxy morphologies [13]. Typical global binarization techniques that can be used for symbol detection in technical documents cannot be applied directly given the continuity between the shape and its background with images of galaxies. Multiple approaches have been proposed. Neural networks using backpropagation have first been used [14], then decision trees [15]. In [16], they compared a Naive Bayes classifier, an artificial neural network, and a decision tree using a sample of 800 galaxies, showing the interest of ensembles of classifiers. In [17], they used a neural network, and a locally weighted regression method, with an ensembles of classifiers, and obtained 91% accuracy when considering E, S and I galaxy classes. The geometric shape features and direct pixel images of galaxies have been compared and classified with neural networks [18], highlighting the interest of shape descriptors. Ganalyzer was proposed as a tool for automatic galaxy image analysis [19]. An image analysis unsupervised learning algorithm using a weighted Euclidean distance was proposed for the detection of peculiar galaxies [20]. State of the art performance has been obtained in [21] thanks to data augmentation, regularization, parameter sharing, and model averaging, highlighting the performance of convolutional neural networks (CNNs), which is another type of feedforward artificial neural network.
III Methods
The problem related to the classification of objects that can be oriented in different angles can be treated in different ways. First, the problem can be simply ignored and the classifier will have to deal with features corresponding to different orientations. In such a case, considering a discriminant approach, a deep architecture should be considered to achieve a type of “or” between the different possible orientations. This approach can be judicious if there are only a few types of angles for the rotations and if these angles are equally distributed between the training and the test stage. Then, the images can be clustered in relation to the different angles. With density based approach and a large number of labeled examples covering all the different orientations, the different orientations may be ignored. The second approach is based on the estimation of the orientation of the image to reorient the images in relation to their main direction. If there is no main direction or if its estimation is difficult, it can add errors to the following processing steps. In the third way, descriptors invariant to the rotation can be extracted, i.e., the descriptors of an object will be identical, independently of the object’s orientation.
A key approach for the extraction of rotation invariant features is to transform the input image, which is originally in Cartesian coordinates into polar coordinates, changing the rotation invariant problem into a circular-shift invariant problem, where features can be obtained from the whole image (2D) or from the different rings (1D) that compose the image. Before this step, the gravity center of the chosen image must be estimated and the image should be centered on its gravity center. We consider the discrete representation of an image of size described in Cartesian coordinates by with , and . The image in polar coordinates is described by with , , and . The discrete description of the image in polar coordinate is a matrix of size , we have and with the angular and radial sampling steps defined by: , , and and . It has been shown that if normalized invariant moments in circular windows are used, then template matching in rotated images becomes similar to template matching in translated images [22].
III-A Flusser and Hu moments
Moment invariants have been introduced for pattern recognition problems by Hu [23]. They have been successfully used in a large number of 2D shape detection problems. These moments have been further analyzed and developed as complex moments [24]. A complex moment of the order of an integrable image function is defined by:
[TABLE]
with denoting the imaginary unit (). The complex moments can be represented through geometric moments :
[TABLE]
where the two-dimensional geometric moment of order is defined by:
[TABLE]
After the rotation of the image by an angle , we obtain:
[TABLE]
The Flusser moments are defined as follows (second and third orders: to , fourth order to )
[TABLE]
It is worth mentioning that low-order moments are less sensitive to noise than the higher-order ones. The Flusser invariants are denoted by:
[TABLE]
The Hu invariants [23] can be defined in relation to the complex moments described before [24].
[TABLE]
The Hu invariants are denoted by:
[TABLE]
III-B Zernike rotation invariant
Zernike provided a set of complex polynomials that form a complete orthogonal set over the interior of the circle of a radius equal to 1, (). The set of these polynomials is denoted by [25, 26]. These polynomials are defined by:
[TABLE]
where , , with and , represents the length of the vector from the the origin O to P , with .
The radial polynomial is defined by:
[TABLE]
with .
Given the relationships between and , we define two vectors and containing a subset of the possible values for and . Hence, can be estimated for each couple: , and with and can be estimated in relation to only.
For Zernike moments of order , we have different polynomials. For instance, for , we can consider the following vectors:
[TABLE]
Zernike moments of order with repetition , or , based on the projection of the image onto the orthogonal basis functions, are defined by:
[TABLE]
with .
Following a rotation of angle ,
[TABLE]
Therefore, because . So there is only the need to compute the for . and for becomes:
[TABLE]
The set of rotation invariant features becomes:
[TABLE]
III-C Ring projection
The image is first centered on its gravity center and transformed into polar coordinate in relation to a maximum radius [27]. Each ring can be defined by its first raw moment (the mean ) and its higher central moments (variance , skewness , and kurtosis ), defining a set of rotation invariant features:
[TABLE]
The set of rotation invariant features becomes:
[TABLE]
III-D FFT based rotation invariant
This approach is also based on the previous ring projection technique. In each ring, we consider the magnitude of the Fourier coefficient, which are shift invariant.
[TABLE]
In order to use the Fast Fourier Transform, we consider as a power of 2. The set of rotation invariant features can be:
[TABLE]
with . In order to be less sensitive to the high frequencies, we consider a log selection of the different features. We select the values for , , , and the mean (bandpower) between and with . It leads to features.
III-E Fourier Mellin transform invariant
The Mellin transform and the Fourier-Mellin transform (FMT) are widely used transforms in pattern recognition with the motivation to extract features invariant to rotation and scale [28, 29]. This transform has been successfully used with learning vector quantization and the K-nearest neighbors for character and symbol classification in technical documents [30, 31]. The analytical FMT of an image in polar coordinate is defined by:
[TABLE]
with and .
If we consider the image corresponding to the rotation of an angle and scale change of factor of the object described in , we have . As the two images contain the same shape, the background being set to 0, the analytical FMT of in relation to can be expressed as:
[TABLE]
The FMT descriptors of two similar objects presented in two images with different orientation will only differ by a phase factor. Therefore, a set of descriptors invariant to the rotation can be obtained by selecting the magnitude of the FMT descriptors [32].
The discrete FMT approximation is estimated for . The parameter is set to 0.5 [33]. The sampling step over is set to 1 and the approximation is given by:
[TABLE]
In addition, for real-valued functions such as images, the analytical FMT is symmetrical:
[TABLE]
Therefore, the FMT can be estimated for close to half of the elements in , such as . We go from:
[TABLE]
with to the estimation of only:
[TABLE]
which corresponds to elements.
The discrete FMT approximation can be expressed in Cartesian coordinates without creating an image in polar coordinates:
[TABLE]
where , , , and correspond to the set of coordinates describing the rectangle containing the object and where the position represents the gravity center of the image.
Despite the fact that the magnitude of the complex moments can provide some rotation invariant features, the phase of the FMT includes substantial information related to the shape included in the image. To solve this issue, a set of invariant has been proposed to keep the information of the phase by normalizing with the first moments to compensate the change of scale and the rotation [34, 35].
[TABLE]
where the normalization in relation to the orientation is achieved through and .
From the FMT, we can extract two sets of rotation invariant features:
[TABLE]
has real values while has complex values.
III-F Normalization
The different features in each set can be normalized by using transformations such as the z-score, i.e. by removing the mean and dividing by the standard deviation. Such a normalization can be beneficial in some techniques where there is no prior discriminant power on the different features. However, for the moment based techniques, it has been shown that the high order moments are more sensitive to the noise. Hence, a z-score normalization applied on all the moments can equalize the discriminant power of each moment and provide a lower performance when used with the Euclidean distance for the comparison of two objects.
IV Datasets
IV-A Artificial dataset
IV-A1 Dataset description
The artificial datasets are based on 11 types of galaxies: E0, E3, E7, S0, Sa, Sb, Sc, SBa, SBb, SBc, and I. Each type is represented by a graylevel image template of size 64 64. The original templates are depicted in Fig. 2 with their respective class. Each image corresponds to a artificial representative example. For the evaluation of the different methods to extract rotation invariant features, we consider six main conditions: 1) the original template images, 2) the original images with speckle noise (corresponding to the superposition of stars), 3) the original images with Gaussian noise (corresponding to the removal of high frequency information), 4) the original images filtered with a Gaussian filter with a standard deviation of the Gaussian distribution ; 5) same as condition 4 but with , and 6) the original images with the different types of noise (Speckle and Gaussian), filtered (. For each condition, we consider 12 different angles for each image (). Therefore, there are 12 examples per class in conditions 1 to 5, and 144 examples for condition 6 (12 rotations, 4 types of Gaussian filtering, 3 types of noise (Speckle, Gaussian, no noise)). We denote each condition by its corresponding database , e.g. for condition 1.
For the evaluation of these datasets, we consider three main cases. In the first case, we consider all the 11 classes: each class has 12 images. Here, it can be expected that for an example given as a test, it can retrieve the 11 other rotated images of the same image. This estimation aims at verifying the properties of rotation invariance for the different descriptors and the extent to which the calculation in the discrete case can lead to some ambiguities. In the second case, some of the initial classes are clustered together leading to 5 classes: S0, S (containing Sa, Sb, Sc), SB (containing SBa, SB, SBc), I, and E (containing E0, E3, E7). This analysis aims at estimating the potential confusions that may happen for instances within the same cluster. In the third case, we consider only 3 classes: S, SB, and E. For the first and third case, there is an equal number of images for each class, hence we report the precision to retrieve the total number of images from a class or cluster, and the average precision. For the second case, we report the precision at the rank equal to the minimum number of images in a class (12). For the second and third case, the goal of the evaluation is to determine if there exists a substantial difference between the images within a group and how these intragroup differences have an impact on the performance.
Before the feature extraction procedure, each image is normalized the following way: the image is centered on its gravity center, the maximum radius is extracted and the image is reduced to a square of size with the gravity center being in the center of the image. The image is then resized using Bilinear interpolation to and a border of 2 pixels is added around the image. It is worth nothing that for the detection of galaxy morphologies, there is no expected confusion between different classes contrary to the problem of multi oriented character recognition that leads to expected confusions, e.g., (’E’,’M’), (’Z’,’N’).
IV-A2 Parameters for the descriptors
The number of descriptors for Hu and Flusser is fixed: 7 for and 11 for . For the estimation of and , we consider and , giving a set of 12 and 40 features for and , respectively. For the FFT, we consider 32 points for the analysis in the Fourier domain, and , which leads to 48 features for . For and , we consider the parameters (k,v) such that , , and . It provides 61, 113, and 181 features for k=5, 7, 9, respectively. For , we consider only the magnitude of the complex features to reduce the number of features.
IV-A3 Classification and performance evaluation
An image belongs to a class , , with being the number of classes of images in the dataset. Each class has examples. The classifier taking as an input the set of features representing the image returns a ranked list of examples , excluding , sorted by increasing order of distances between and the elements in the list, i.e., , with .
The performance for each image belonging to the class is estimated by the average precision:
[TABLE]
where represents the maximum number of relevant images to retrieve, i.e., the number of images belonging to the same class tested, minus the image tested itself. is defined as:
[TABLE]
The precision is defined by:
[TABLE]
It counts the number of images in the list that belong to the class . If the returned image at rank belongs to the class , i.e., it is the same class as the evaluated example, then it is 1, 0 otherwise. When multiple classes are clustered and the number of examples in each cluster is different, we report the precision at rank where .
IV-B Zoo Galaxy dataset
IV-B1 Dataset description
In this problem, we consider the classification of real images of galaxies extracted from the Galaxy Zoo 2 (GZ2) dataset. All the details about the Zoo Galaxy dataset can be found in [3]. In this dataset, a large number of images were given to participants who had to answer a series of questions (11 tasks and 37 answers) for each image. The groundtruth of the images corresponds to the score across all the participants, indicating a confidence score for all the different questions. We consider a subset of the whole dataset that contains 61578 color images. Among the questions, we select images that do not contain an “odd” element (i.e. the presence of a ring, a disturbed or irregular galaxy) with a confidence of at least 0.9. It limits the total of relevant images to 22481. In the present case, we limit the scope of the classification to elliptical versus spiral galaxies, which have been labeled with a confidence of at least 0.9, corresponding to 2517 elliptical and 2908 spiral galaxies. With the intersection of non-odd images, we finally obtain: 1545 elliptical and 884 spiral galaxies. Some representative images are depicted in Fig. 3.
IV-B2 Pre-processing
Each image from the dataset has the size . First the images are transformed to graylevel and cropped, as we keep the center of the image with a size of . The images are then downsampled to . As the images are more noisy and may contain multiple elements such as stars. The image is binarized with the global thresholding Otsu method that separates pixels into foreground and background classes by minimizing intra-class intensity variance [36]. Then using morphological operators, we first apply the closing operation with a structuring element of size (a square) followed by a dilation using a structuring element of size (a disk) to compensate the problems related to the binarization with dark areas that still contain information. The binarization mask is then used to select the foreground of the original image. Finally, the image is centered on its gravity center , the border is removed while keeping the gravity center in the center of the image, resized to , and a border of 2 pixels is added around the image, leading to an image of size .
After normalizing the image, a Laplacian pyramid with 4 levels is applied on the image [37]. The original image is convolved with a Gaussian kernel with a standard deviation of the Gaussian distribution set to 2. The Laplacian is then computed as the difference between the original image and the low pass filtered image. This process is repeated 4 times. This set of 4 images represents the input for the feature extraction part: the set of rotation invariant features is the concatenation of the rotation invariant feature sets applied to each of the 4 images. For each set of descriptors, the number of features is the same as for the artificial datasets, but multiplied by a factor 4, as there are multiple images given as an input. The different preprocessing steps are depicted in Fig. 4.
IV-B3 Performance evaluation
For the binary classification of elliptical versus spiral galaxies, we consider 10-fold cross-validation procedure, with one block being used for the evaluation and the remaining 9 blocks for training. For each partition, the training dataset contains 1386 and 792 for the elliptical and spiral galaxies, respectively, while the test contains 154 and 88 examples. Given the unbalanced dataset in terms of number of examples per class, we report the area under the receiver operating characteristic curve (AUC) [38]. In addition, we report the true positive rate (TPR), the false positive rate (FPR), the false negative rate (FNR), the true negative rate (TNR), and the f-score. All the scores represented in the subsequent sections correspond to the mean and standard deviation across the 10 partitions. They are defined by:
[TABLE]
where TP, FP, FN, TN corresponds to the true spirals, false spirals, false elliptical, and true elliptical, respectively. P and N corresponds to the total number of spiral and elliptical galaxies, respectively.
IV-B4 Classification
For the binary classification, we consider the follow supervised learning techniques: Support Vector Machines (SVMs) [39], Bayesian Linear Discriminant Analysis (BLDA) [40], and stepwise Linear Discriminant Analysis (stepLDA) [41], and Extreme Learning Machine (ELM), a type of feedforward neural networks where the parameters of hidden units don’t need to be tuned [42]. These techniques provide a fast estimation of the classifiers with a minimum number of hyper-parameters [43]. The regular Linear Discriminant Analysis (LDA) is not considered as it requires some regularization technique due to the covariance matrices that was not always well conditioned on pilot tests with some descriptors. For the ELM, we consider 1000 hidden units and a sigmoid function as the activation for the random projections, which are normalized to obtain an an orthonormal basis of the random weights [44].
As pattern recognition systems for the classification of images using convolutional neural networks represent the state of the art [45], we consider four architectures as a baseline to establish the relevance of rotation invariant descriptors. In each architecture, we use the Adam optimization algorithm [46]; the minibatch size is set to 64; the activation function for all the units is the rectified linear unit (ReLu) [47]; the maximum number of epochs is set to 100; the initial learning rate is set to 10e-4. The input layer is of size 64 64 while the output layer has 2 units (one for each class). The first architecture (A1) corresponds to a regular multilayer perceptron with only a single fully connected (FC) hidden layer of 100 units. The second architecture (A2) adds a convolutional layer with 5 feature maps, and filters of size 5 5 and stride of 2 in each direction, with no padding. The third architecture adds another convolutional layer (A3) with the same properties as the previous convolutional layer, with 25 feature maps. Finally the last architecture adds another convolutional layer (A4), with 125 feature maps, resulting in an architecture with 3 convolutional layers and a fully connected hidden layer. A2, A3, and A4 are CNNs. For each architecture, we consider two training conditions, with and without data augmentation based on the addition of rotated images (, ).
V Results
This section presents the performance for the artificial dataset results through an image retrieval angle, and for binary classification of elliptical versus spiral galaxies using real images.
V-A Artificial datasets
The results for each condition and for each set of features are presented in Tables I, II,and III. The first table corresponds to the original images from the chosen templates. These results highlight the problem related to the high dimensionality of the input feature set then used with the Euclidean distance, as there are shapes from different classes that are relatively similar. For , , and , the best performance is obtained by Hu, Flusser moments, and the ring projection with FFT. These results show the low impact of Gaussian filtering on the images at the given scale. For , with speckle noise, the best performance is obtained with Flusser moments with a precision of 98.83%. However, with the addition of Gaussian noise, the best performance is achieved with the ring projection with FFT with a precision of 98.97%. For when all the variations of the images are combined (rotation, noise, Gaussian filtering), the best precision reaches only 90.59% with the the ring projection and FFT approach. It is worth noting the low performance of the Hu and Flusser approaches that provide only 58.77% and 59.82%, respectively, showing their low discriminant power when noise is added in the images. The best performance with the FMT is achieved with K=V=7 with 87.26% by considering only the magnitude of the moments while the magnitude of the normalized moments provides only 75.49%. In both cases, the choice of 7 for K and V provides a better performance than 5 and 9.
When the classes are clustered, the Euclidean distance performs poorly with the descriptors based on Hu, Flusser, and Zernike with less than 80% in average precision. The best method is with an average precision of 83.36%, followed by the descriptors with k=v=9, with 82.30%. These results highlight the robustness of the Fourier Mellin based descriptors when there exist a large intra-class variability.
V-B Galaxy Zoo
The performance for the different type of features are detailed in Tables IV and V with values in %. The best performance for each table is given in bold. The best performance is obtained with the combination of the ring projection, and the stepLDA classifier with an average AUC=99.54. The second best technique is the ring projection with FFT in each ring combined with BLDA. The performance of ELM with only AUC=97.09 suggests there is no need to add an extra level in the architecture and simple linear classifiers are enough given the provided input features. The level of performance across the different descriptors is relatively similar than the artificial dataset (): Hu and Flusser based moments do not provide the best descriptors.
The results for the artificial neural networks are presented in Table VI. The performance with only a single hidden layer reaches an AUC of 90.36% whereas the performance increases when convolutional layers are added. The best performance is obtained with the architecture using 3 convolutional layers, with an AUC of 96.81%. This performance is inferior to what could be obtained with the best rotation invariant descriptors, e.g., , , but it remains superior to and . These results indicate that the variability that exists within the images can be captured by the neural networks, and that it is better modeled through the use of convolutional layers. With the addition of rotated images in the training dataset, the performance substantially improved in all the architectures, reaching an AUC of 99.04% with A4, which is very close to the best performance obtained with rotation invariant descriptors. These results suggest that despite having images with various orientations in the training database, it is necessary to enrich the training database to improve the performance.
V-C Relationships between classification and human confidence
The selected images in the Galaxy Zoo 2 dataset included only images where the confidence score was above 90%. Fig. 5 represents the relationship between the global confidence and the classification score obtained with and stepLDA. The distribution of the number of examples in relation to the chosen threshold highlights the substantial increase of the number of examples in the dataset. As expected, the performance of the classifier decreases as a function of the chosen threshold due to the addition of more difficult images. The AUC remains nevertheless above 97% for all the chosen thresholds.
VI Discussion
A comprehensive description of the main families of rotation invariant features descriptors has been described in this paper, including moments of Hu, Flusser, Zernike, and Fourier-Mellin. The ring projection technique has been defined using both 1D moments and the Fourier transform through the analysis of log decomposition of the bandpowers. These techniques have been evaluated and compared on different datasets relative to the classification of galaxy morphologies, highlighting the strengths and pitfalls of these approaches. Using these techniques, a complete framework has been proposed for fast classification with low level features based on a Laplacian pyramid and pre-processing based on morphological filtering. These techniques have been compared with convolutional neural networks, which represent the state of the art for image classification tasks.
The results obtained with the artificial dataset confirms the interest of all the different techniques as descriptors when the images are clean. However, there is a tradeoff between the dimensionality of the input vectors when used with distances, and the level of noise in the images. The results confirm the issues with Hu and Flusser moments in noisy images. The results obtained with the real Galaxy Zoo dataset highlight the possibility to discriminate with a high performance images of galaxies. Following the same pattern of performance as with the artificial images, the Hu and Flusser moments provide the worst accuracy, albeit above 80%. There was not a significant difference of performance across the binary classifiers as all the results remain in the range of %. The worst classifier was ELM with a maximum accuracy of 97.09%. These results suggest that the use of morphological filtering for denoising images followed by Laplacian pyramid as input features and the use of rotation invariant descriptors give state of the art results that can be exploited by astronomers for galaxy morphology classification. These results are higher than the results obtained with the convolutional neural networks that had 97%, then data augmentation was not present. However, the results are relatively similar (above 99%) when data augmentation is used in the training dataset, confirming that CNNs are robust classifiers in such an application despite the large variability that exists across examples in relation to their different orientations. In both cases, the image processing pipeline on one hand, and the neural network architecture and its parameters on the other hand, are designed in relation to type of input image to classify.
While this paper is dealing with the effect of the rotation, the scale and the normalization of the images has a key impact on the choice of the method. Before the extraction of descriptors that are invariant to the rotation, the first problem to solve is the invariance to the translation, which is typically dealt with the normalization to the gravity center of the image. In noisy images, this step can have a significant impact on the results. In addition, the normalization of the scale of the image is a key step for the methods that rely directly on the radius such as the ring projection. The difficulty related to the estimation of the maximum radius of the image can prevent the use of the ring projection and the direct FFT approach on the polar representation of the images due to the variability of the optimal radius across images.
The creation of a pattern recognition system for object classification is difficult as it requires multiple stages, from feature selection, reduction, and/or extraction to the classification stage. In relation to a given problem, features can be extracted analytically or through deep learning approaches. The latter approach has obtained great performance compared to the former in computer vision. Yet, the use of descriptors that are invariant to translation and rotation allows to have an estimation of the differences between the different classes of objects without requiring a large number of labeled examples. CNNs are relevant classifiers applied to images but it remains difficult to understand what happens within the last hidden layers for extracting high level features. Furthermore, CNNs are able to absorb a large number of variations (translation, rotation, scaling,…) through data augmentation, which requires to know how to enrich the initial training database. CNNs have been recently used for classifying radio galaxies [48] and for the Zoo Galaxy projects [21]. Thanks to the low level features that can be extracted by CNN models, it would be possible to replace the Laplacian pyramid that has been used in this work by feature maps obtained from CNNs. Transfer learning through the use of CNNs based features may provide a better feature set to apply translation, rotation, and scale invariant descriptors. Furthermore, the research in CNN architectures should enable the use of functions connecting layers to extract directly rotation invariant features without introducing as an input all the different possible orientations. Other novel approaches such as capsule networks provide promising results for galaxy morphology classification [49].
Most of the frameworks for the classification of galaxies remain supervised. Data labeled by regular citizen can be used training machine learning systems and the performance depends on the quality of the data, which can be improved by using images that have a high degree of agreement across participants [50]. The involvement of the participants is currently separated from the machine learning part, preventing the use of active learning techniques that can combine dynamically manual labeling and machine learning in order to minimize the amount of manual work while keeping a high reliability in the decisions [51]. The definition of efficient descriptors for the classification is one step toward the definition of efficient distances that can be considered for graph based semi-supervised learning that can estimate the characteristics of galaxies using labeled images, unlabeled images, and query human participants in an active learning setting.
VII Conclusion
In this paper, we have provided a comprehensive description of the main families of rotation invariant descriptors that can be considered for the analysis of galaxy morphologies. Six main techniques have been presented and compared in different datasets, with artificial images in the context of content retrieval, and for the classification of images using real images of galaxies. The results have been compared with convolutional neural networks and highlighted the low difference in terms of performance between the descriptors. The ring based methods using 1D moments or the Fast Fourier Transform have provided the best performance then used with linear binary classifiers. With an AUC reaching 99.54%, these encouraging results suggest the possibility to determine finer galaxy morphological characteristics by using the proposed image processing framework combining a Laplacian pyramid, rotation translation invariant descriptors, and state of the art binary classifiers. Future work will deal with the effect of the errors related to the estimation of the gravity center and the presence of external elements within the shape that can disturb its analysis.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] N. M. Ball and R. J. Brunner, “Data mining and machine learning in astronomy,” International Journal of Modern Physics D , vol. 19, no. 7, pp. 1049–1106, 2010.
- 2[2] C. Lintott, K. Schawinski, S. Bamford, A. Slosar, K. Land, D. Thomas, E. Edmondson, K. Masters, R. C. Nichol, M. J. Raddick, A. Szalay, D. Andreescu, P. Murray, and J. Vandenberg, “Galaxy Zoo 1: data release of morphological classifications for nearly 900 000 galaxies,” Monthly Notices of the Royal Astronomical Society , vol. 410, pp. 166–178, Jan. 2011.
- 3[3] K. W. Willett, C. J. Lintott, S. P. Bamford, K. L. Masters, K. R. Simmons, Brooke D. Casteels, E. M. Edmondson, L. F. Fortson, S. Kaviraj, W. C. Keel, T. Melvin, R. C. Nichol, M. J. Raddick, K. Schawinski, R. J. Simpson, R. A. Skibba, A. M. Smith, and D. Thomas, “Galaxy zoo 2: detailed morphological classifications for 304,122 galaxies from the sloan digital sky survey,” Monthly Notices of the Royal Astronomical Society , vol. 435, p. 1–29, 2013.
- 4[4] L. Shamir, “Automatic morphological classification of galaxy images,” Monthly Notices of the Royal Astronomical Society , vol. 399, no. 3, pp. 1367–1372, 2009.
- 5[5] L. Shamir, D. Diamond, and J. Wallin, “Leveraging pattern recognition consistency estimation for crowdsourcing data analysis,” IEEE Trans. on Human-Machine Systems , vol. 46, pp. 474–480, June 2016.
- 6[6] E. P. Hubble, “Extra-galactic nebulae,” Astrophysical Journal , vol. 64, pp. 321–369, 1926.
- 7[7] R. J. Buta, “Galaxies: Classification,” in Encyclopedia of Astronomy and Astrophysics , P. Murdin, Ed. Bristol: Institute of Physics Publishing, 2001.
- 8[8] ——, “Galaxy morphology,” in Planets, Stars, and Stellar Systems , T. D. Oswalt and W. C. Keel, Eds., 2011, vol. 6, pp. 1–89.
