Residual Expansion Algorithm: Fast and Effective Optimization for Nonconvex Least Squares Problems
Daiki Ikami, Toshihiko Yamasaki, Kiyoharu Aizawa

TL;DR
The Residual Expansion (RE) algorithm offers a fast, effective, and easy-to-implement method for near-global optimization in nonconvex least squares problems, outperforming traditional stochastic approaches.
Contribution
It introduces the RE algorithm, a novel deterministic optimization technique that achieves fast, near-global solutions for nonconvex least squares problems, suitable for high-dimensional tasks.
Findings
Excellent empirical performance in k-means clustering
Effective in point-set registration and image deblurring
Outperforms stochastic methods in speed and accuracy
Abstract
We propose the residual expansion (RE) algorithm: a global (or near-global) optimization method for nonconvex least squares problems. Unlike most existing nonconvex optimization techniques, the RE algorithm is not based on either stochastic or multi-point searches; therefore, it can achieve fast global optimization. Moreover, the RE algorithm is easy to implement and successful in high-dimensional optimization. The RE algorithm exhibits excellent empirical performance in terms of k-means clustering, point-set registration, optimized product quantization, and blind image deblurring.
| 0.905 | 0.894 | 0.902 | 0.921 | |
| 0.854 | 0.856 | 0.862 | 0.876 | |
| 0.843 | 0.844 | 0.846 | 0.854 | |
| 0.837 | 0.839 | 0.840 | 0.843 |
| 0.905 | 0.894 | 0.902 | 0.921 | |
| 0.854 | 0.856 | 0.862 | 0.876 | |
| 0.843 | 0.844 | 0.846 | 0.854 | |
| 0.837 | 0.839 | 0.840 | 0.843 |
| 2.699 | 1.758 | 1.493 | 0.998 | |
| 2.784 | 1.209 | 0.789 | 0.630 | |
| 2.722 | 1.036 | 0.708 | 0.552 | |
| 2.749 | 0.994 | 0.630 | 0.552 |
| Number of successes | Number of iterations | ||||
| ICP algorithm | 26 | 4 | 1 | 46.7 | |
| RE algorithm () | 46 | 25 | 2 | 47.4 | |
| 49 | 31 | 5 | 110.1 | ||
| 49 | 33 | 6 | 310.2 | ||
| 49 | 36 | 6 | 1008 | ||
| Image | #1 | #2 | #3 | #4 | #5 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Kernel | #1 | #2 | #3 | #4 | #1 | #2 | #3 | #4 | #1 | #2 | #3 | #4 | #1 | #2 | #3 | #4 | #1 | #2 | #3 | #4 |
| Pan et al. [23] | 13.6 | 13.2 | 20.3 | 13.1 | 14.9 | 16.5 | 14.1 | 11.2 | 11.8 | 20.6 | 19.4 | 10.0 | 13.7 | 11.8 | 19.1 | 11.9 | 19.6 | 19.2 | 16.8 | 14.1 |
| Alg. 1 | 20.2 | 20.2 | 21.3 | 16.4 | 17.0 | 16.9 | 15.8 | 7.4 | 20.9 | 18.8 | 20.4 | 6.5 | 23.6 | 20.4 | 19.7 | 12.5 | 21.3 | 18.2 | 17.5 | 9.4 |
| Alg. 2 | 20.2 | 19.6 | 20.5 | 15.7 | 17.2 | 17.1 | 15.8 | 8.4 | 19.5 | 21.5 | 20.7 | 14.2 | 23.6 | 20.5 | 19.1 | 11.9 | 21.9 | 17.7 | 18.6 | 15.2 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Advanced Image and Video Retrieval Techniques · Remote-Sensing Image Classification
Residual Expansion Algorithm: Fast and Effective Optimization for
Nonconvex Least Squares Problems
Daiki Ikami Toshihiko Yamasaki Kiyoharu Aizawa
The University of Tokyo, Japan
{ikami, yamasaki, aizawa}@hal.t.u-tokyo.ac.jp
Abstract
We propose the residual expansion (RE) algorithm: a global (or near-global) optimization method for nonconvex least squares problems. Unlike most existing nonconvex optimization techniques, the RE algorithm is not based on either stochastic or multi-point searches; therefore, it can achieve fast global optimization. Moreover, the RE algorithm is easy to implement and successful in high-dimensional optimization. The RE algorithm exhibits excellent empirical performance in terms of k-means clustering, point-set registration, optimized product quantization, and blind image deblurring.
1 Introduction
Many problems in computer vision and machine learning can be formulated as optimization problems. If we can formulate a problem as a convex optimization, we can solve it by convex optimization techniques such as gradient-based methods. However, most optimization problems are nonconvex and often have many local minima. In these cases, convex optimization techniques can find only local minima.
Global optimization of nonconvex problems is an NP-hard problem in most cases. Therefore, heuristic methods are often used to find a global (or near-global) optimum. There are two major approaches: good initialization and stochastic optimization. The former is fast and effective if we can obtain a good initial guess [2]; however, many optimization problems do not provide this. The latter is random search or multiple-point search, which is represented by simulated annealing (SA) [13], particle swarm optimization (PSO) [12], and genetic algorithms (GA) [19]. Although these methods are effective with low-dimensional optimization problems, it is difficult to obtain good solutions with high-dimensional ones. Moreover, these approaches often require excessive computation time to obtain a good solution.
In this paper, we propose a fast and effective optimization method for nonconvex least squares (LS) problems such as k-means clustering and point-set registration. First, we propose a novel measure of convergence called RE convergence: this represents how far we can expand data points along their residual directions under convergence. Fig. 1 shows k-means results and expanded data points. Fig. 1(a) depicts convergence on expanded data while Fig. 1(b) shows a case that is not converged. We presume that RE convergence is associated with global convergence. In fact, we can prove that the solution that is stable on a large expansion is the global optimum in the case of a one-dimensional quartic minimization problem.
Additionally, we propose a heuristic algorithm to find a solution that is stable on the large expansion, which we term the residual expansion (RE) algorithm. This algorithm is based on neither multiple-point search nor random search, and thus fast computation can be achieved.
Our contribution is as follows:
We propose a novel concept of convergence, RE convergence. We show the relationship between RE convergence and the global optimum. 2. 2.
We propose the RE algorithm, which can be applied for any nonconvex LS problem. We show that the RE algorithm is fast, effective, and easy to implement. 3. 3.
We show the RE algorithm’s excellent performance for various nonconvex LS problems such as k-means clustering, point-set registration, optimized product quantization, and blind image deblurring.
2 Related works
2.1 Nonconvex least squares problems
We focus on nonconvex LS problems, of which many exist. In this paper, we study the following four important problems in computer vision and machine learning.
2.1.1 K-means clustering
K-means clustering is one of the most popular clustering methods with various applications such as quantization [11], feature learning [8], and segmentation [1]. K-means clustering assigns data vectors to the nearest representative clusters. The optimization problem can be formulated as
[TABLE]
where is a data matrix, is a matrix of cluster centroids, and is an assignment matrix.
The most popular optimization method is Lloyd’s algorithm [17], which has an update step (fix and update ) and an assignment step (fix and update ). Hartigan’s algorithm [10] achieves better clustering than Lloyd’s algorithm because the set of local minima of Hartigan’s algorithm is a subset of those of Lloyd’s algorithm [26, 25]. For good initialization, k-means++ [2] is often used because of its efficiency and effectiveness.
2.1.2 Point-set registration
Point-set registration is a fundamental problem in computer vision. Here we consider a rigid 3D-point-set registration problem: Given source point sets and target point sets , we estimate the best rigid transformation parameters.
In this paper, we consider the following optimization problem with a point-to-point cost function:
[TABLE]
where is a rotation matrix, is a translation vector, and is an assignment matrix. is an identity matrix and is a vector of all ones.
The iterative closest point (ICP) algorithm [3] is a well-known alternating optimization method: it fixes and updates , , and then fixes and updates . To obtain a global minimum, some studies adopt stochastic optimization, such as GA [24], PSO [27], and SA [18]. Recently, Yang et al. proposed Go-ICP [29], which guarantees global optimality by using the branch-and-bound algorithm. However, it requires significant computation time.
2.1.3 Optimized product quantization
Optimized product quantization (OPQ) [9, 22], which is an extension of product quantization (PQ), is an efficient fast approximate nearest neighbor search method. The optimization problem in OPQ is described by
[TABLE]
where have the same meaning as in Section 2.1.1 and is a rotation matrix.
The optimization problem of Eq. (6) can be solved by alternating optimization of , , and [9, 22]. Ge et al. also proposed a parametric optimization method that assumes the data follows a parametric Gaussian distribution [9].
2.1.4 Blind image deblurring
Blind image deblurring has long been a challenging problem in computer vision. From a blurred image , we estimate an original image and blur kernel by minimizing the following cost function:
[TABLE]
where and are the regularization terms, and denotes the convolution operator. For , L0-norm (or approximately L0-norm) [28, 23], or L1/L2 functions [15] are proposed to enforce the sharp edges of the original image. For , L2-norm [28, 23] or L1-norm [15] are often used. We refer to the paper [16] for a recent comparative study of blind image deblurring.
We can minimize Eq. (7) by alternating optimization of and . For fast and effective optimization, a coarse-to-fine strategy [7, 15, 28, 23] is generally employed.
2.2 Nonconvex optimization techniques
Most nonconvex optimization techniques are based on stochastic optimization, including GA [19], PSO [12], and SA [13]. These methods generally do not work well or require significant computation time for high-dimensional optimization problems. Several studies [4, 14, 18, 27, 24] have employed these nonconvex optimization techniques to our target problems described in Section 2.1; however, these methods are not often used in practice.
Our approach is related to graduated nonconvexity (GNC) [5], which first solves a simplified optimization problem and then gradually transforms the problem into the original nonconvex problem. The basic concept of graduated optimization methods is extinguishing local minima by using a convexified objective function, and then gradually changing the objective function to the original function. We refer readers to [20] for a recent survey of graduated optimization. In contrast to GNC, our approach is explicitly to escape from poor local minima by using a largely expanded objective function and then gradually transforming it into the original function, as described in Section 4.
3 Residual expansion convergence
First, we describe RE convergence, which indicates how we can expand data along their residual directions. RE convergence is a measure of the depth of convergence, and our proposed algorithm is based on this concept. We show a relationship between the global optimum and RE convergence.
We discuss a nonconvex least squares (LS) optimization problem whose objective function is given by
[TABLE]
Our definitions are as follows.
Definition 3.1** (Residual Expansion).**
Let be a local minimum point of Eq. (8). We define the -expanded objective function :
[TABLE]
where is constructed by expanding in the residual direction with a magnitude of as
[TABLE]
We call the operation that constructs the -expanded objective function residual expansion (with ).
Definition 3.2** ( RE convergence).**
* is called RE convergence if there exists a constant such that is still a local optimum of . In particular, the maximum (or supremum) constant is called the RE constant111If is always a local minimum of under all , we define the RE constant as . .*
Our hypothesis is that a solution with a larger -RE constant is closer to the global optimum solution. This hypothesis holds in the case of quartic minimization, as presented in section 3.1.1.
3.1 Unconstrained and differentiable problems
We consider one of the simplest cases: unconstrained and differentiable LS problems. Given a local optimum , we can obtain first- and second-order derivatives of the -expanded objective function at as
[TABLE]
where is a Jacobian matrix and is
[TABLE]
Eq. (11) means that is always a stationary point of . Therefore, is a local minimum of if and only if is a positive semi-definite (PSD) matrix. If is not a PSD matrix, there is a which satisfies the fact that is not a PSD matrix.
Fig. 2 shows examples of -expanded objective functions. Residual expansion elevates the objective function around , and if is sufficiently large then it ceases to be a local minimum.
One-dimensional quartic minimization: Here we consider a quartic minimization problem—in particular, one that can be formulated as an LS problem:
[TABLE]
We consider the case where Eq. (14) has two local minima and . The following theorem then holds:
Theorem 1**.**
Let and be local minima points of Eq. (14) with RE constants of and , respectively. is the global minimum point if and is the global minimum point otherwise.
Proof.
Please refer to the supplementary materials. ∎
3.2 General relationship between the RE convergence and the global optimum
It is not obvious when our hypothesis, i.e., that a solution with a larger RE constant is closer to the global optimum, is valid. Unfortunately, we can easily find a counterexample in k-means clustering, as shown in Fig. 3. However, our algorithm, which aims to find a solution with a large RE constant, works well from an empirical perspective in many nonconvex LS problems.
4 Residual expansion algorithm
In this section, we propose the RE algorithm, which aims to find a solution with a large RE constant. Since it is difficult to find the solution with the largest RE constant exactly, we employ a heuristic strategy.
The RE algorithm has two steps: parameter updating and residual expansion. We show an intuitive illustration of the algorithm in Fig. 4. For the residual expansion step, we expand data along their residual direction. This results in elevating the objective function around the current solution as in Fig. 2. For the parameter-updating step, instead of minimizing the original function Eq. (8), we minimize the following expanded objective function in each iteration:
[TABLE]
where is an expanded data vector:
[TABLE]
where and are expansion and momentum parameters, respectively. Note that, if , Eq. (15) is an exactly -expanded objective function on . The momentum parameter is important for achieving good performance and ensuring that the RE algorithm does not to diverge, as described later.
The RE algorithm iterates through parameter updating by minimizing Eq. (15) and residual expansions by Eq. (16) and Eq. (17). We use a large initially to find a solution with a large RE constant. Then we decrease gradually to achieve convergence, analogous to a temperature parameter in SA. We summarize the RE algorithm in Alg. 1.
4.1 Parameter setting for convergence
The RE algorithm has two parameters, and , for each iteration. We decrease to 0 for convergence. when , there is no residual expansion and RE algorithm is guaranteed to converge if the original LS problem has a convergence-guaranteed algorithm. However, inadequate parameters of and cause unstable optimization. We consider the norm of . We can obtain
[TABLE]
We use for the last approximation of Eq. (18). Eq. (18) suggests to make the RE algorithm stable. A good way to determine these values of and is described in Section 5.
4.2 Advantages of the RE algorithm
Our algorithm consists of two steps of parameter updating and residual expansion. Parameter updating is based simply on a typical LS problem approach. Therefore, if there is a source code which minimizes Eq. (8), for example, by alternative optimization or gradient methods, then we can implement our algorithm by adding a residual expansion step to the existing code, which can be done in a few lines of code.
Moreover, the computational complexity of residual expansion is generally less than that of parameter updating. Therefore, we can achieve faster optimization than most nonconvex optimization techniques based on multi-point search or random search, such as SA and GA.
As described in Section 2.2, GNC is a similar approach to ours; however GNC often does not apply for LS problems. Our algorithm can be applied for any nonconvex LS problem provided that there is a method for finding a local optimum, such as Lloyd’s algorithm for k-means clustering and ICP algorithms for point-set registration.
5 Alternate direction method of multipliers for least squares problems
In this section, we apply the alternate direction method of multipliers (ADMM) [6] to solve Eq. (8). We show that ADMM is a special case of the RE algorithm for LS problems. Moreover, ADMM suggests a modified RE algorithm for regularized LS problems.
We introduce an auxiliary variable and rewrite Eq. (8) as a constrained optimization problem:
[TABLE]
We can construct the augmented Lagrangian function of Eq. (19) as
[TABLE]
We take the alternating direction approach for solving Eq. (20) and then obtain update rules as
[TABLE]
5.1 Relation to the RE algorithm
We can simplify Eq. (21), Eq. (22) and Eq. (23) as
[TABLE]
Details of the derivation are described in the supplementary material. This is a special case of the RE algorithm of Eq. (16) and Eq. (17) with
[TABLE]
There are two main advantages to using ADMM. First, we can choose only instead of parameters and in the general RE algorithm. Eq. (26) and Eq. (27) always satisfy , which is a condition necessary for avoiding divergence to infinity, as described in Section 4.1, and this update achieves good performance in experiments. Second, ADMM can treat regularized LS optimization problems, such as blind image deblurring (Eq. (7)). We will describe this in the next section.
5.2 Regularized least squares problems
We consider a regularized LS problem as follows:
[TABLE]
When we apply the RE algorithm in a straightforward manner, we can obtain the following objective function in each iteration:
[TABLE]
In the case of ADMM, from Eq. (24), the objective function is as follows:
[TABLE]
We can find that the difference between Eq. (29) and Eq. (30) is simply the coefficient of the squared term. We find that minimizing Eq. (30) achieves better performance than minimizing Eq. (29). We summarize the RE algorithm based on ADMM in Alg. 2.
6 Implementation details
We used the RE algorithm based on ADMM (Alg. 2) unless otherwise stated. In the update of (line 2 in Alg. 2), we perform alternating optimization with a single iteration; for example, with k-means clustering, the cluster centers and assignments are updated only once. The four problems we treat in this paper can be minimized by alternating optimization.
For the parameter , we adapt , where to satisfy . Therefore, we only need to determine the two parameters and in our method.
7 Experimental results
We evaluate the performance of the RE algorithm on four nonconvex LS problems: k-means clustering, 3D point set registration, OPQ, and single blind image deblurring. All experiments were executed on an Intel Core i5-4200U CPU (1.60 GHz) with 8GB of RAM, and were implemented in MATLAB222Our codes will be available if the paper is accepted. except for Go-ICP [29]. For Go-ICP and its comparison experiment, we used the publicly available code333http://iitlab.bit.edu.cn/mcislab/~yangjiaolong/go-icp/ implemented in C++.
7.1 K-means clustering
We compared our method with k-means++ [2], which is a good initialization method, and Hartigan’s algorithm [10]. For Hartigan’s algorithm, we first used Lloyd’s algorithm [17] with k-means++ initialization for fast computation. We reported the total time of Lloyd’s algorithm and Hartigan’s algorithm. For the other method, we used Lloyd’s algorithm for optimization. We used random initialization for the RE algorithm. For error measurement, we used the objective function value of Eq. (1) and reported relative error from the value of k-means++ (therefore, the relative error of k-means++ is always 1).
7.1.1 Synthetic data experiments
We start with two synthetic datasets as shown in Fig. 5. We repeated each method 50 times from different initializations and report the average relative errors. Table 1 shows the results of our method with different and . We found that larger achieved better performance. We also found that smaller achieved better performance in dataset B; however, larger achieved better performance in dataset A. This indicates that the best setting is different for different data distributions. Intuitively, dataset B requires a larger residual expansion (in other words, small ) to escape from a poor local minimum, while dataset A requires a smaller residual expansion.
We show comparison results in Table 2. We repeated each method 50 times from different initializations. K-means++ worked well with dataset B. On the other hand, Hartigan’s algorithm can improve the results of k-means++ in dataset A; however, this does not work in dataset B. The RE algorithm worked best in both cases, even though it was initialized by random seeding. Moreover, the RE algorithm with achieved comparable speed to k-means++ with better performance for dataset A.
7.1.2 Real-world data experiments
We used two real-world datasets for comparison: the cloud dataset444https://archive.ics.uci.edu/ml/datasets/Cloud and the COIL20 dataset [21]. We performed experiments in the same manner as in Section 7.1.1.
Table 3 shows comparative results. In the cloud dataset, k-means++ achieves faster and better clustering than random seeding. The RE algorithm with achieved better clustering than k-means++ with about 1.8 times the computational cost. The RE algorithm with performed best, and found the near-global optimum in every case. For the COIL20 dataset, although k-means++ and Hartigan’s algorithm did not work well, the RE algorithm significantly outperformed the other methods.
7.2 Point set registration
We compared the RE algorithm with the ICP algorithm and Go-ICP [29]. Go-ICP is known as a method that can achieve global optimization. We used the bunny model from the Stanford3D dataset555http://graphics.stanford.edu/data/3Dscanrep/, as in Fig. 6. For the target model, we used a partial point set as in Fig. 6(b). In the experiments, point sets were normalized within a cube of .
We made a rotation matrix from a random rotation axis and the rotation angle . The target point set was constructed by this rotation matrix, and we added Gaussian noise with a standard deviation of . We performed 50 tests with different random rotation axes at each rotation angle . For measurement of the error, we used the objective value Eq. (2) and regarded the results as successful if the objective error was less than 1 (this value is approximately twice of the average objective value using Go-ICP, as in Fig. 7).
We first show the comparison results between the RE algorithm and the ICP algorithm as in Table 4. The RE algorithm with achieved a better success rate with almost the same number of iterations as the ICP algorithm. Using a large can improve the results to a small extent.
We also compare our method to Go-ICP [29]. Go-ICP has two steps: the ICP algorithm and the branch-and-bound algorithm. We compared the original Go-ICP and RE + Go-ICP, which has the two steps of ICP with the RE and branch-and-bound algorithms. Fig. 7 plots all 50 results in . Note that this comparison was implemented entirely in C++. Go-ICP always found the global optimum solution; however, it required significant computation. RE + Go-ICP reduced computational cost while achieving global optimization.
7.3 Optimized product quantization
We show that the RE algorithm is successful in OPQ optimization problems. We compare our method with the alternating optimization method [9, 22]. For a dataset, we used SIFT 1M [11], which contains 100,000 128-dimensional SIFT descriptors for training. We set the subspace number and cluster number , which are often used in the field of approximate nearest neighbor search. For error measurement, we used the objective function value of Eq. (6). For our method, we set .
We plot the objective function value versus iteration number in Fig. 8. We performed five repetitions using different initializations and report the average objective values obtained. Our method improved the objective function value; moreover, it achieved rapid convergence in the cases of and . The RE algorithm elevates the objective function around the current solution; in other words, it transforms the gradient for the current solution into a steeper gradient, potentially causing rapid convergence.
7.4 Blind image deblurring
We evaluated our method with single blind image deblurring. There are many formulations for blind image deblurring. In this paper, we followed Pan et al.’s formulation [23], which can be minimized by alternating optimization. We compared three methods as follows: a coarse-to-fine strategy [23] and RE algorithms based on Alg. 1 and Alg. 2. We used the uniform blurred text images from the dataset provided by Lai et al. [16], which contains five latent images and four blurring kernels for a total of 20 blurred images. For all methods, we used the same objective function parameters, such as the regularization coefficients. For our method, we set and .
We show the results in Table 5. Our method significantly outperforms Pan’s method [23] and is successful for a significantly blurred image, as in Fig. 9. We found that Alg. 2 is superior to Alg. 1 in the cases of {image #3, kernel #4} and {image #5, kernel #4}. Note that these results are obtained by minimizing the same objective function, however using different optimization methods. Therefore our method likely improves upon other methods which use different objective functions [15, 28].
8 Conclusion
We proposed the RE algorithm, which is a novel global optimization algorithm for nonconvex LS problems. This method is based on a novel measurement of global convergence called RE convergence. We presented theoretical analysis of RE convergence and empirical results showing excellent performance of the RE algorithm for various optimization problems.
There remain many open questions in both theoretical and empirical aspects. We can guarantee that the solution with the largest RE constant is the global optimum in limited cases. To which problems this applies remains unknown. We plan to investigate the applicability of the RE algorithm to other nonconvex optimization problems, including non-LS problems.
Acknowledgement
This research is partially supportted by CREST (JPMJCR1686) and KAKENHI(15K12025)
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk. SLIC superpixels compared to state-of-the-art superpixel methods. TPAMI , 34(11):2274–2282, 2012.
- 2[2] D. Arthur and S. Vassilvitskii. k-means++: The advantages of careful seeding. In SODA , pages 1027–1035, 2007.
- 3[3] P. J. Besl and H. D. Mc Kay. A method for registration of 3-D shapes. TPAMI , 14(2):239–256, 1992.
- 4[4] G. Blais and M. D. Levine. Registering multiview range data to create 3D computer objects. TPAMI , 17(8):820–824, 1995.
- 5[5] A. Blake and A. Zisserman. Visual reconstruction . MIT Press, 1987.
- 6[6] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning , 3(1):1–122, 2011.
- 7[7] S. Cho and S. Lee. Fast motion deblurring. ACM To G , 28(5):145:1–145:8, 2009.
- 8[8] G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In In Workshop on Statistical Learning in Computer Vision, ECCV , pages 1–22, 2004.
