Fast High-Dimensional Kernel Filtering
Pravin Nair, Kunal N. Chaudhury

TL;DR
This paper introduces a scalable, fast kernel filtering method for high-dimensional images using the Nyström approximation, enabling efficient bilateral and nonlocal means filtering with theoretical error guarantees.
Contribution
It extends low-rank kernel approximation techniques to high-dimensional data via the Nyström method, overcoming scalability issues of previous approaches.
Findings
Effective filtering of color and hyperspectral images
Competitive performance with state-of-the-art algorithms
Theoretical bounds on approximation error
Abstract
The bilateral and nonlocal means filters are instances of kernel-based filters that are popularly used in image processing. It was recently shown that fast and accurate bilateral filtering of grayscale images can be performed using a low-rank approximation of the kernel matrix. More specifically, based on the eigendecomposition of the kernel matrix, the overall filtering was approximated using spatial convolutions, for which efficient algorithms are available. Unfortunately, this technique cannot be scaled to high-dimensional data such as color and hyperspectral images. This is simply because one needs to compute/store a large matrix and perform its eigendecomposition in this case. We show how this problem can be solved using the Nystr\"om method, which is generally used for approximating the eigendecomposition of large matrices. The resulting algorithm can also be used for nonlocal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Fast High-Dimensional Kernel Filtering
Pravin Nair, and Kunal N. Chaudhury
Abstract
The bilateral and nonlocal means filters are instances of kernel-based filters that are popularly used in image processing. It was recently shown that fast and accurate bilateral filtering of grayscale images can be performed using a low-rank approximation of the kernel matrix. More specifically, based on the eigendecomposition of the kernel matrix, the overall filtering was approximated using spatial convolutions, for which efficient algorithms are available. Unfortunately, this technique cannot be scaled to high-dimensional data such as color and hyperspectral images. This is simply because one needs to compute/store a large matrix and perform its eigendecomposition in this case. We show how this problem can be solved using the Nystrm method, which is generally used for approximating the eigendecomposition of large matrices. The resulting algorithm can also be used for nonlocal means filtering. We demonstrate the effectiveness of our proposal for bilateral and nonlocal means filtering of color and hyperspectral images. In particular, our method is shown to be competitive with state-of-the-art fast algorithms, and moreover it comes with a theoretical guarantee on the approximation error.
Index Terms:
Kernel Filter, Nystrm Method, Approximation, Fast Algorithm, Error Bound.
I Introduction
The bilateral and nonlocal means filters [1, 2] are widely used for edge-preserving smoothing and denoising of images [3, 4]. These are instances of kernel filters, where the similarity (affinity) between pixels is measured using a symmetric kernel. We refer the reader to [4] for an excellent review of kernel filters. While they have proven to be useful in practice, a flip side of kernel filtering, including bilateral filtering (BLF) and nonlocal means (NLM), is their computational complexity [3]. Nevertheless, several fast algorithms have been proposed, e.g. [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26], which can speed up BLF and NLM, without compromising their filtering quality. See [21, 11, 26] for a survey of these algorithms. Unfortunately, most algorithms only work with grayscale images, and cannot be extended to color, multispectral, and hyperspectral images.
Algorithms for fast BLF of color images have been proposed in [12, 21, 27, 28, 29]. However, to the best of our knowledge, these methods have not been extended for multispectral and hyperspectral images. Fast algorithms for generic high-dimensional BLF and NLM have been proposed in [18, 19, 20, 30]. A common feature of these algorithms is that they use data clustering or tessellation in high-dimensions. The state-of-the-art fast algorithms for color BLF are [19, 21], and for color NLM is [20].
More recently, it was shown in [15, 17] that fast BLF of grayscale images can be performed using the partial eigendecomposition of the kernel matrix. In fact, the interpretation of BLF (and NLM) as kernel filters goes back to [31, 32, 33]. While the Nystrm method has widely been used in machine learning [34, 35, 36], it appears that [31] is the first to apply this for image filtering. Note that, unlike [15, 17], the spatial and range kernel are treated as a single kernel in [31, 32, 33].
The differences between our and related approaches are:
As explained in detail in §II, it is difficult to scale [15, 17] for filtering high-dimensional (even color) images, since one needs to populate a huge kernel matrix and compute its eigendecomposition. We propose to use the Nystrm method to solve this problem. As a result, we are able to perform BLF and NLM of color and hyperspectral images.
The first difference with [31, 32, 33] is that we use clustering instead of uniform sampling for the Nystrm approximation. A significant improvement in filtering accuracy is achieved as a result. The other difference is that if a spatial kernel has to incorporated in [31, 32, 33], then the Nystrm approximation needs to be performed in the spatio-range space. However, we handle the spatial and range components differently—fast convolutions are used for the spatial component and Nystrm approximation is used for the range component. As a result, we require lesser samples for the Nystrm approximation.
In [28, 29], clustering is used to compute “intermediate” images, which are interpolated to get the final output. On the other hand, clustering is used in our method just to obtain the “landmark points” for the Nystrm approximation.
Compared to [18, 19, 20, 21], our algorithm is conceptually simple and easy to implement. Moreover, we are able to derive a bound on the filtering error incurred by the approximation. Such a guarantee is not offered by [18, 19, 20, 21].
The rest of the paper is organized as follows. In §II, we introduce the notion of kernel filtering, and explain the core problem in relation to the spectral approximations in [15, 17]. We use the Nystrm method in §III to overcome this problem. Numerical results are reported in §IV and we conclude in §V.
II Background
We begin by formulating BLF and NLM as kernel filters [4]. Suppose the input image is , where is the spatial domain, is the range space, and (resp. ) is the dimension of the domain (resp. range). Let be the guide image, which is used to control the filtering. For standard BLF, and are identical, and and for grayscale and color images. However, and (also and ) can be different for joint BLF [3]. For NLM, is generally larger than , where is the number of pixels in a patch [2]. Let be the range kernel. The filtered output is given by
[TABLE]
where is a square window around consisting of pixels, with being the window radius. The spatial kernel controls the weighting of the neighboring pixels involved in the averaging. At this point, we just assume that is symmetric, i.e., for . For example, , for standard BLF and NLM, where is the Euclidean norm.
It was shown in [15, 17] that the non-linear operations in (1) can be computed using convolutions by approximating . For convenience, we will describe this using our notations. Let the actual range of be
[TABLE]
We emphasize that is a list and not a set, i.e., we allow repetition of elements in . In particular, let be some ordering of the elements in , where is the number of elements. This means that, given , for some . We track this correspondence using the index map , where
[TABLE]
We next define the kernel matrix given by
[TABLE]
In terms of (4), we can write (1) as
[TABLE]
It is clear from (4) that is symmetric. In particular, let the eigendecomposition of be
[TABLE]
where are its eigenvalues, and are the corresponding eigenvectors. Substituting (6) in (5), we can write its numerator as
[TABLE]
On switching the sums, this becomes
[TABLE]
where denotes the convolution of the image \boldsymbol{h}_{k}(\boldsymbol{x})=\boldsymbol{u}_{k}\big{(}\iota(\boldsymbol{x})\big{)}\boldsymbol{f}(\boldsymbol{x}) with . An identical argument applies for the denominator. In summary, we can compute (5) using convolutions, for which several efficient algorithms are available [37, 38]. Moreover, by considering just the largest eigenvalues, fast and accurate approximations can be obtained [15, 17].
Unfortunately, computing the full kernel and its eigendecomposition becomes prohibitively expensive when is large. Just as an example, consider an -bit color image for which and . Even if we assume that is just of the maximum range cardinality (), we will still need to populate a matrix, and compute its eigenvalues. The situation is worse for hyperspectral images, where is of the order of tens, or even hundreds.
III Proposed Method
Originally, the Nystrm method was used for approximating the solution of functional eigenvalue problems [39, 40]. The method has found useful applications in machine learning and computer vision for approximating the eigendecomposition of large matrices [34, 35, 31]. In the present context, the goal is to approximate (6) using a decomposition of the form
[TABLE]
where and . Clearly, the rank of is at most . Thus, for small , is a low-rank approximation of . A large results in a better approximation, but at higher computational cost. In practice, a good tradeoff is required.
The original kernel is defined on . In the Nystrm method [39, 40], we first construct a smaller kernel , compute its eigendecomposition, and then “extrapolate” the eigenvectors of to approximate those of . More precisely, we pick few landmarks points from , say, , and define a kernel on :
[TABLE]
Clearly, is symmetric, and its size is much smaller than . Thus, we can efficiently compute its eigendecomposition:
[TABLE]
where and . We next construct on given by
[TABLE]
where and . This captures the kernel values between the points in and the landmark points. This matrix is used to extrapolate as follows:
[TABLE]
This completes the specification of and in (8). We refer the reader to [35] for the intuition behind the approximation. The effective speedup of replacing (6) by (8) is . This is because the complexity of eigendecomposition of a matrix is [41]. In particular, the speedup is significant since . As will be evident shortly, we just need to compute and ; we will not use explicitly.
Following [36], we select the landmark points by clustering . More specifically, we partition into disjoint sets using -means clustering, and take the centroids to be the landmarks. Note that, though is not guaranteed to be a subset of , we can still apply the above approximation.
It was shown in [36] that the kernel error can be bounded by the quantization error. More specifically, let be the kernel error ( is the Frobenius norm), and let
[TABLE]
be the quantization error, where is the minimizer of over . Then the following bound holds [36].
Proposition 1
Suppose there exists some such that, for ,
[TABLE]
Then the approximation error can be bounded as
[TABLE]
where the positive constants and do not depend on . In particular, (13) holds when is a Gaussian.
Proposition 1 suggests that we can reduce the kernel error by making small. However, measures how well is represented by the landmark points. Following this observation, -means clustering was used in [36] for determining the landmarks. It was empirically shown in [36] that clustering indeed results in smaller error over uniform sampling [35, 31]. We will see that this is also true for our algorithm.
We arrive at a fast algorithm by replacing by . It is clear from (7) that the resulting approximation is given by
[TABLE]
[TABLE]
where and are defined as and .
The computation of (14) and (15) involves convolutions, since for each , there are convolutions in (14) and one in (15). The main point is that we have been able to express the non-linear kernel filter using convolutions, for which efficient algorithms are available. In particular, (14) and (15) can be performed using operations (w.r.t. the size of the spatial kernel), when is a box or Gaussian [42, 37, 38]. The overall algorithm is described in Algorithm 1 (source code in [43]), where the symbols and are used to denote pixelwise addition, multiplication, and division. The complexity of -means clustering and the eigendecomposition of are [44] and [41]. On the other hand, the complexity of the convolutions in (14) and (15) is , where is the number of pixels. Since the complexity of the brute-force implementation is \mathcal{O}\big{(}|\Omega|(2S+1)^{d}(n+\rho)\big{)} [3], and convolutions are the dominant operations in our algorithm, we obtain an effective speedup of . This is significant as is typically large [3].
We now comment on the filtering accuracy, namely, how well is (1) approximated by (14). Intuitively, we expect the approximation to be accurate if . In fact, since the difference is controlled by the quantization error (Proposition 1), we have the following result.
Theorem 2
Suppose and are positive, and satisfies the property in Proposition 1. Then
[TABLE]
where do not depend on .
The main steps of the derivation are given in the supplement. Theorem 2 is true for BLF and NLM, where is a Gaussian. A practical implication of this result is that the filtering accuracy is guaranteed to increase with (Figure in the supplement). Deriving a similar bound is difficult for [18, 19, 20, 21].
IV Results
We demonstrate the effectiveness of our algorithm for BLF and NLM of high-dimensional images by comparing it with state-of-the-art algorithms. Instead of standard NLM [2], we have used PCA-NLM [45], where the denoising performance of the former is improved by applying PCA on the collection of patches. As for the dataset, we have used the color images from [46] and the hyperspectral images from [47]. Experiments were performed using Matlab on a GHz quad-core machine with GB memory. The spatial kernel for BLF is a Gaussian (covariance and ), while it is a box in PCA-NLM. The range kernel is Gaussian (covariance ) for both BLF and PCA-NLM. We have used the fast algorithm in [37] when is a Gaussian, and the Matlab routine “imfilter” when is a box. Note that we can also use other fast Gaussian filters [42, 38] if higher accuracy is desired.
Color BLF. The state-of-the-art fast algorithms for color BLF are Adaptive Manifolds (AM) [20], Permutohedral Lattice (PL) [19], and Global Color Sparseness (GCS) [21]. We have compared with them in Figure 2. The number of manifolds is set automatically in AM, whereas we have used clusters in GCS and for the Nystrm approximation. Following [20, 21], we used PSNR to measure the error between the brute-force and fast implementations. In Figure 2, notice that while our PSNR marginally exceeds that of GCS, it is however much better than PL and AM. Also notice the significant acceleration over the brute-force implementation obtained using our algorithm. We have also provided a table comparing the different methods on the Kodak dataset [46] in the supplement. The table shows that our method is better than GCS and PL when . As claimed in the introduction, we can see from the table that clustering provides a significant boost in filtering accuracy ( dB) over uniform sampling.
Color NLM. AM is the state-of-the-art fast algorithm for color NLM (and PCA-NLM). In NLM, , where is the patch radius [2]. On the other hand, is reduced to a smaller value in PCA-NLM using PCA. Following [45], we set to be three times the noise level for all experiments. Denoising results are shown in Figure 1, where and . For (b), (c), and (d), PCA was used to reduce the range dimension from to . We used clusters (resp. manifolds) for the Nystrm approximation (resp. AM). Following [50], we measured the denoising performance using PSNR and SSIM (between the clean and denoised images). Note that we are superior to AM both in terms of accuracy and timing. Importantly, our PSNR is close to PCA-NLM (the method being approximated), but we are about faster. In comparison with BM3D [51], our PSNR is dB less. However, our timing is about half that of BM3D, since our complexity is much less than that of BM3D. Additional visual comparisons and accuracy analysis is provided in the supplement.
Hyperspectral BLF. Finally, we present a denoising result for a hyperspectral image of size bands using BLF (). We have also compared with state-of-the-art methods for hyperspectral denoising [48, 49], whose parameters have been tuned accordingly. The results are shown in Figure 3. We have used landmarks for the Nystrm approximation. As a standard practice, the and SSIM values are averaged over the spectral bands. Notice that our method can restore details better, which results in higher PSNR/SSIM values. In particular, the color is not satisfactorily restored in [48] and grains can be seen in [49]. Being a one-shot method, we are much faster than [48, 49].
V Conclusion
We showed that fast bilateral and nonlocal means filtering of high-dimensional images can be performed using the Nystrm approximation. The proposed algorithm can significantly accelerate the brute-force implementation of these filters, without compromising the visual quality. In particular, our algorithm is often competitive with state-of-the-art fast algorithms, and comes with provable guarantee on the filtering accuracy.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” Proc. IEEE International Conference on Computer Vision , pp. 839–846, 1998.
- 2[2] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” Proc. IEEE Conference on Computer Vision and Pattern Recognition , vol. 2, pp. 60–65, 2005.
- 3[3] S. Paris, P. Kornprobst, J. Tumblin, and F. Durand, “Bilateral filtering: Theory and Applications,” Foundations and Trends® in Computer Graphics and Vision , vol. 4, no. 1, pp. 1–73, 2009.
- 4[4] P. Milanfar, “A tour of modern image filtering: New insights and methods, both practical and theoretical,” IEEE Signal Processing Magazine , vol. 30, no. 1, pp. 106–128, 2013.
- 5[5] F. Durand and J. Dorsey, “Fast bilateral filtering for the display of high-dynamic-range images,” ACM Transactions on Graphics , vol. 21, no. 3, pp. 257–266, 2002.
- 6[6] S. Paris and F. Durand, “A fast approximation of the bilateral filter using a signal processing approach,” Proc. European Conference on Computer Vision , pp. 568–580, 2006.
- 7[7] J. Chen, S. Paris, and F. Durand, “Real-time edge-aware image processing with the bilateral grid,” ACM Transactions on Graphics , vol. 26, no. 3, p. 103, 2007.
- 8[8] F. Porikli, “Constant time O(1) bilateral filtering,” Proc. IEEE Conference on Computer Vision and Pattern Recognition , pp. 1–8, 2008.
