Transition Subspace Learning based Least Squares Regression for Image Classification
Zhe Chen, Xiao-Jun Wu, and Josef Kittler

TL;DR
This paper introduces TSL-LSR, a novel method for image classification that learns a transition subspace with low-rank constraints to better preserve data structure and improve discriminative power.
Contribution
The paper proposes a transition subspace learning approach with low-rank constraints for multicategory image classification, addressing overfitting and data structure preservation.
Findings
Outperforms state-of-the-art algorithms on multiple datasets
Effectively captures intrinsic data structures
Reduces overfitting in projection learning
Abstract
Only learning one projection matrix from original samples to the corresponding binary labels is too strict and will consequentlly lose some intrinsic geometric structures of data. In this paper, we propose a novel transition subspace learning based least squares regression (TSL-LSR) model for multicategory image classification. The main idea of TSL-LSR is to learn a transition subspace between the original samples and binary labels to alleviate the problem of overfitting caused by strict projection learning. Moreover, in order to reflect the underlying low-rank structure of transition matrix and learn more discriminative projection matrix, a low-rank constraint is added to the transition subspace. Experimental results on several image datasets demonstrate the effectiveness of the proposed TSL-LSR model in comparison with state-of-the-art algorithms
| Classes | Features | Total Num. | Training Num. | |
|---|---|---|---|---|
| AR | 100 | 540 | 2600 | 1000 |
| CMU PIE | 68 | 1024 | 11554 | 680 |
| Feret | 200 | 1600 | 1400 | 800 |
| COIL-20 | 20 | 1024 | 1440 | 200 |
| Algorithms | AR | CMU PIE | Feret | COIL-20 |
| LRC[7] | 74.121.50 | 75.671.01 | 46.581.33 | 92.301.15 |
| CRC[8] | 93.360.53 | 86.390.60 | 57.071.79 | 89.091.48 |
| ProCRC[9] | 95.280.41 | 89.000.37 | 64.402.54 | 90.610.95 |
| DLSR[10] | 93.790.50 | 87.540.79 | 71.151.27 | 93.271.43 |
| ReLSR[11] | 94.530.56 | 88.180.79 | 72.982.19 | 93.651.94 |
| GReLSR[12] | 95.180.74 | 86.880.72 | 70.382.14 | 90.981.62 |
| RLSL[13] | 94.210.35 | 87.700.63 | 68.331.57 | 93.751.87 |
| TSL-LSR (ours) | 96.340.43 | 89.920.35 | 85.731.39 | 94.341.02 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Remote-Sensing Image Classification · Sparse and Compressive Sensing Techniques
Transition Subspace Learning based Least Squares Regression for Image Classification
Zhe Chen, Xiao-Jun Wu∗, and Josef Kittler, Corresponding author. Zhe Chen and Xiao-Jun Wu are with the School of Internet of Things, Jiangnan University, Wuxi 214122, China.
E-mail: [email protected], [email protected] Josef Kittler is with the Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, U.K.
E-mail: [email protected]
Abstract
Only learning one projection matrix from original samples to the corresponding binary labels is too strict and will consequentlly lose some intrinsic geometric structures of data. In this paper, we propose a novel transition subspace learning based least squares regression (TSL-LSR) model for multicategory image classification. The main idea of TSL-LSR is to learn a transition subspace between the original samples and binary labels to alleviate the problem of overfitting caused by strict projection learning. Moreover, in order to reflect the underlying low-rank structure of transition matrix and learn more discriminative projection matrix, a low-rank constraint is added to the transition subspace. Experimental results on several image datasets demonstrate the effectiveness of the proposed TSL-LSR model in comparison with state-of-the-art algorithms. This paper is under consideration at Pattern Recognition Letters.
Index Terms:
Least squares regression, transition subspace learning, low-rank structure constraint, multicategory image classification.
I Introduction
Least squares regression (LSR) is a very popular tool in the field of pattern recognition, becasuse of its computational efficiency and mathematical tractability. Many modified models, including LASSO regression [1], partial LSR [2], least-square support vector machine [3], kernel ridge regression [4], weight LSR [5], were proposed for classification tasks. Besides, some representation based classification algorithms, such as sparse representation based classification (SRC) [6], linear regression based classification (LRC) [7], collaborative representation based classification (CRC) [8] and probabilistic CRC (ProCRC) [9], are also calculated under the LSR model. These algorithms have achieved varying degrees of success in improving classification accuracy.
Consider training samples from classes, where denotes a sample vector. is the dimensionality of the sample. If collecting these samples as a training matrix , the standard LSR model can be defined as follows
[TABLE]
where is a regularization parameter and is the projection matrix which to be learned. is the binary label matrix. The th column of , i.e., , is the label vector of sample . Suppose is from the th class , then only the th element of is equal to 1 and all the others are 0. Obviously, problem (1) has a closed-form solution . For a given test sample , LSR predicts its label as , where is the th entry of .
In recent years, researchers developing LSR have focused more on learning relaxed regression targets to replace zero-one labels. For example, Xiang et al. [10] presented a discriminative least squares regression (DLSR) model by utilizing a technique called -dragging. The idea of DLSR was to enlarge the margins between the true and the false classes as much as possible, after the original samples are projected into corresponding label space, which intuitively facilitates classification. Retargeted LSR (ReLSR) [11] directly learned the regression targets from data which can guarantee all samples are correctly classified with the large margins. Wang et al. [12] proposed a new groupwise ReLSR (GReLSR) model by introducing a groupwise regularization term to encourage the within-class samples have similar translation values.
However, directly minimizing the regression error between the projection features and labels is too restrictive. Only one projection matrix is not enough to contain sufficient discriminative information. Besides, both -dragging and margin constraint techniques can also enlarge the distances between the within-class regression targets. In addition to learning relaxed targets, RLSL [13] proposed to learn a latent feature subspace that can be regarded as a intermediate between the original samples and binary labels. Nevertheless, RLSL did not take into account the structural characteristics of learned latent subspace.
In this paper, a novel transition subspace learning based LSR (TSL-LSR) model is proposed for multiclass classification. The main advantage of TSL-LSR is the learning of transition subspace which can preserve more underlying structural information in the learned projection. Specifically, the contributions of TSL-LSR can be highlighted as follows
(1) We propose to learn a transition subspace to avoid the problem of over-fitting, which is more flexible than learning projection from samples to zero-one labels directly.
(2) TSL-LSR first transforms the original samples into a transition subspace, then transforms the transition subspace into the space of binary labels. Hence, there are two projection matrices to be learned in the TSL-LSR model and both of these two matrices are used for classification.
(3) To guarantee consistency and global optimum of transformation learning, two projection matrices are learned in a joint framework.
(4) A low-rank constraint is imposed on the transition matrix to capture the underlying feature structures (low-rank structure) of different classes.
(5) The low-rank transition subspace can also be extended to the slack targets based LSR models which is helpful to learn similar and compact within-class regression targets.
II Transition Subspace Learning based Least Squares Regression (TSL-LSR)
II-A The Model of TSL-LSR
Since binary labels already have enough discriminability for classification, TSL-LSR still uses the zero-one labels as the final regression targets. But unlike DLSR, ReLSR and GReLSR, TSL-LSR learns discriminative projections by introducing a low-rank transition subspace to avoid the loss of structural information, rather than relaxing the binary regression targets. The model of TSL-LSR can be formulated as
[TABLE]
where , , and are positive regularization parameters. , and are variables which need to be optimized. is the transition matrix and is the dimensionality of transition subspace. and are two projection matrices. is the nuclear norm operator (the sum of matrix singular values) and denotes the low-rank constraint on matrix .
The consequence of introducing the transitional transformation space, , is that TSL-LSR must learn two projection matrices in one model. However, this is more flexible than learning one projection matrix. The first projection matrix, , is used to transform the original samples into the transition subspace, and the second, , is used to transform the transition subspace into the space of binary labels. The reasons for adding a low-rank constraint on transition subspace can be summarized as follows
(1) The final regression targets, i.e. label matrix , are low-rank (rank=), thus it is reasonable to assume the transition space is also low-rank.
(2) For real-world image classification tasks, images are often collected in realistic conditions, so that they are subject to noise, which has an adverse effect on classification. Thus we assume that the features obtained after the first-step projection, i.e. , are heterogeneous. We try to recover a low-rank subspace from the corrupted features based on the assumption that the clean data structures are approximately drawn from a low-rank subspace. As a result, more useful structure information of images can be captured during the transformation learning process. The proposed learning framework (2) is illustrated in Fig. 1. As shown in Fig. 1, we find that the features extracted by our TSL-LSR model include two parts: the first-step features and the second-step features .
II-B Optimization of TSL-LSR
The objective function in (2) cannot be directly optimized because the variables (i.e, , and ) are interdependent. Therefore, we use the alternating direction multipliers method (ADMM) [14] to solve the optimization problem. We first introduce an auxiliary variable to make problem (2) separable and give its augmented Lagrangian function as
[TABLE]
where is the Lagrangian multiplier, is the penalty parameter. Each variable, such as , , and , is updated with other variables fixed.
Update : By fixing variables , and , can be obtained by minimizing the following problem
[TABLE]
We set the derivative of with respect to to zero, and obtain the following closed-form solution
[TABLE]
Update : can be obtained by minimizing the following problem
[TABLE]
which has a closed-form solution as
[TABLE]
Update : can be obtained by minimizing the following problem
[TABLE]
Likewise, has a closed-form solution
[TABLE]
Update : can be obtained by minimizing the following problem
[TABLE]
Formula (10) can be optimized by the singular value thresholding algorithm [15]. The optimal solution of (10) is
[TABLE]
where is the singular value shrinkage operator. The complete optimization procedures are summarized in Algorithm 1.
Next, we analyze the computational complexity of Algorithm 1. Following [16], the main time-consuming steps of Algorithm 1 are
(1) Matrix inverse in Eq. (5), (7), and (9).
(2) Singular value decomposition in Eq. (11).
The complexity of pre-computing in Eq. (5) is . The complexity of computing each of in Eq. (7) and in Eq. (9) is . The complexity of singular value decomposition in Eq. (11) is . Thus the final time complexity for Algorithm 1 is about , where is the number of iterations.
II-C Classification
Once the optimal projection matrices and are obtained, we can use them to classify test samples. Given a new test sample , its regression is . Then, the nearest-neighbor (NN) classifier is used to predict the label of .
III Experiments
We compare the proposed TSL-LSR model with four latest LSR model based classification methods, including DLSR [10], ReLSR [11], GReLSR [12], RLSL [13], and three representation based classification methods, including LRC [7], CRC [8], and ProCRC [9], on a range of different datasets. For TSL-LSR, DLSR, ReLSR, GReLSR and RLSL, we use the NN classifier. The used datasets consists of two types: (1) Face: the AR [17], CMU PIE [18] and Feret [19] datasets; (2) Object: the COIL-20 [20] dataset. For each dataset, we randomly select several images of each class for training, and the remaining images are used for testing. We repeat all the experiments ten times and report the mean classification results (meanstd). The brief description of these datasets are shown in Table I.
III-A Classification results on different datasets
We first need to determine the value of , where is the row dimensionality of transition matrix . In fact, it is very difficult to tune its value, because could be . From [21], we know can be set to around , where is the number of classes. Fig. 2 presents the classification accuracies (%) versus the value of on two face datasets. We can see that the change in accuracy is not obvious while and the peak is achieved if is approximately equal to . Therefore, in our experiments, we directly fix on all datasets.
The comparative classification results on five datasets are shown in Table II. As shown in Table II, our TSL-LSR model consistently achieves better accuracies than the other algorithms, including the latest two algorithms, such as GReLSR and RLSL. This is mainly because both DLSR, ReLSR and GReLSR algorithms focus on learning slack regression targets without guarding against the problem of over-fitting. In contrast, TSL-LSR introduces a low-rank transition subspace to alleviate the structural information loss caused by restrictive matrix projection. Its learned two projection matrices have a greater capacity to capture the discriminative information conveyed by the data during projection learning. To further validate that whether the learned two projections from TSL-LSR model can capture discriminative features from original samples, we use the t-SNE algorithm [22] to visualize the distribution of the extracted features. From Fig. 3, we can find that TSL-LSR correctly distributes all the samples into their own subspace and the distribution of intra-class samples are very compact which indicates that the extracted features perform ideal inter-class separability and intra-class compactness. This also demonstrates that the transition subspace learning is beneficial for classification.
III-B Convergence Validation
Based on the optimization procedures in Section II(B), it is easy to prove that the proposed TSL-LSR model is convex with respect to each variable. In this section, we validate the convergence of Algorithm 1 on two datasets. The convergence results are shown in Fig. 4. We can see that Algorithm 1 converges very well, with the value of objective function of TSL-LSR monotonically decreasing with the increasing number of iterations. This confirms the effectiveness of the adopted optimization algorithm.
III-C Parameter Sensitivity
In this section, we test the parameter sensitivity of TSL-LSR. TSL-LSR has four parameters to be tuned in our experiments. The parameters and are both set to 0.01, so we just focus on selecting the values of parameters and from the candidate set . The classification accuracy as a function of different parameter values on the four datasets are shown in Fig. 5. It is apparent that the classification accuracy of TSL-LSR is not very sensitive to the values of and .
IV Conclusion
In this paper, an effective transition subspace learning based least squares regression model (TSL-LSR) is proposed for multicategory image classification. Different from traditional LSR based regression models, which directly learn projection from original samples to corresponding label subspace, TSL-LSR tries to learn a low-rank transition subspace to avoid the problem of overfitting caused by restrictive projection learning. Moreover, TSL-LSR imposes a low-rank constraint on the transition matrix to learn more underlying structures of data. Two discriminative projection matrices are learned for classification. Extensive experiments demonstrate the effectiveness of the proposed method.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. Tibshirani, ”Regression shrinkage and selection via the lasso,” J. Roy. Statist. Soc. B (Methodol.), vol. 58, no. 1, pp. 267-288, 1996.
- 2[2] S. Wold, H. Ruhe, H. Wold, and W. Dunn, ”The collinearity problem in linear regression. the partial least squares (PLS) approach to generalized inverses,” J. Sci. Stat. Comput., vol. 5, no. 3, pp. 735-743, Jan. 1984
- 3[3] L. Jiao, L. Bo, and L. Wang, ”Fast sparse approximation for least squares support vector machine,” IEEE Trans. Neural Netw., vol. 18, no. 3, pp. 685-697, May 2007.
- 4[4] S. An, W. Liu, and S. Venkatesh, ”Face recognition using kernel ridge regression,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Minneapolis, MN, USA, pp. 1-8, Jun. 2007.
- 5[5] T. Strutz, ”Data Fitting and Uncertainty: A Practical Introduction to Weighted Least Squares and Beyond,” Wiesbaden, Germany: Vieweg, 2010.
- 6[6] J. Wright, A.Y. Yang, A. Ganesh, et al, ”Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp. 210-227, 2009.
- 7[7] I. Naseem, R. Togneri, and M. Bennamoun, ”Linear regression for face recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 11, pp. 2106-2112, 2010.
- 8[8] L. Zhang, M. Yang, and X. Feng, ”Sparse representation or collaborative representation: Which helps face recognition?” in Proc. of IEEE Int. Conf. Comput. Vis., pp. 471-478, 2011.
