Fast High-Dimensional Kernel Filtering

Pravin Nair; Kunal N. Chaudhury

arXiv:1901.06112·cs.CV·February 20, 2019

Fast High-Dimensional Kernel Filtering

Pravin Nair, Kunal N. Chaudhury

PDF

TL;DR

This paper introduces a scalable, fast kernel filtering method for high-dimensional images using the Nyström approximation, enabling efficient bilateral and nonlocal means filtering with theoretical error guarantees.

Contribution

It extends low-rank kernel approximation techniques to high-dimensional data via the Nyström method, overcoming scalability issues of previous approaches.

Findings

01

Effective filtering of color and hyperspectral images

02

Competitive performance with state-of-the-art algorithms

03

Theoretical bounds on approximation error

Abstract

The bilateral and nonlocal means filters are instances of kernel-based filters that are popularly used in image processing. It was recently shown that fast and accurate bilateral filtering of grayscale images can be performed using a low-rank approximation of the kernel matrix. More specifically, based on the eigendecomposition of the kernel matrix, the overall filtering was approximated using spatial convolutions, for which efficient algorithms are available. Unfortunately, this technique cannot be scaled to high-dimensional data such as color and hyperspectral images. This is simply because one needs to compute/store a large matrix and perform its eigendecomposition in this case. We show how this problem can be solved using the Nystr\"om method, which is generally used for approximating the eigendecomposition of large matrices. The resulting algorithm can also be used for nonlocal…

Equations38

\boldsymbol{g}(\boldsymbol{x})=\frac{\sum_{\boldsymbol{y}\in W_{\boldsymbol{x}}}\omega(\boldsymbol{x}-\boldsymbol{y})\kappa\big{(}\boldsymbol{p}(\boldsymbol{x}),\boldsymbol{p}(\boldsymbol{y})\big{)}\boldsymbol{f}(\boldsymbol{y})}{\sum_{\boldsymbol{y}\in W_{\boldsymbol{x}}}\omega(\boldsymbol{x}-\boldsymbol{y})\kappa\big{(}\boldsymbol{p}(\boldsymbol{x}),\boldsymbol{p}(\boldsymbol{y})\big{)}},

\boldsymbol{g}(\boldsymbol{x})=\frac{\sum_{\boldsymbol{y}\in W_{\boldsymbol{x}}}\omega(\boldsymbol{x}-\boldsymbol{y})\kappa\big{(}\boldsymbol{p}(\boldsymbol{x}),\boldsymbol{p}(\boldsymbol{y})\big{)}\boldsymbol{f}(\boldsymbol{y})}{\sum_{\boldsymbol{y}\in W_{\boldsymbol{x}}}\omega(\boldsymbol{x}-\boldsymbol{y})\kappa\big{(}\boldsymbol{p}(\boldsymbol{x}),\boldsymbol{p}(\boldsymbol{y})\big{)}},

\mathfrak{R}=\big{\{}\boldsymbol{p}(\boldsymbol{x}):\boldsymbol{x}\in\Omega\big{\}}.

\mathfrak{R}=\big{\{}\boldsymbol{p}(\boldsymbol{x}):\boldsymbol{x}\in\Omega\big{\}}.

ι (x) = ℓ if r_{ℓ} = p (x) .

ι (x) = ℓ if r_{ℓ} = p (x) .

K (i, j) = κ (r_{i}, r_{j}) .

K (i, j) = κ (r_{i}, r_{j}) .

\boldsymbol{g}(\boldsymbol{x})=\frac{\sum_{\boldsymbol{y}\in W_{\boldsymbol{x}}}\omega(\boldsymbol{x}-\boldsymbol{y})\mathbf{K}\big{(}\iota(\boldsymbol{x}),\iota(\boldsymbol{y})\big{)}\boldsymbol{f}(\boldsymbol{y})}{\sum_{\boldsymbol{y}\in W_{\boldsymbol{x}}}\omega(\boldsymbol{x}-\boldsymbol{y})\mathbf{K}\big{(}\iota(\boldsymbol{x}),\iota(\boldsymbol{y})\big{)}}

\boldsymbol{g}(\boldsymbol{x})=\frac{\sum_{\boldsymbol{y}\in W_{\boldsymbol{x}}}\omega(\boldsymbol{x}-\boldsymbol{y})\mathbf{K}\big{(}\iota(\boldsymbol{x}),\iota(\boldsymbol{y})\big{)}\boldsymbol{f}(\boldsymbol{y})}{\sum_{\boldsymbol{y}\in W_{\boldsymbol{x}}}\omega(\boldsymbol{x}-\boldsymbol{y})\mathbf{K}\big{(}\iota(\boldsymbol{x}),\iota(\boldsymbol{y})\big{)}}

K = k = 1 \sum m λ_{k} u_{k} u_{k}^{⊤},

K = k = 1 \sum m λ_{k} u_{k} u_{k}^{⊤},

\sum_{\boldsymbol{y}\in W_{\boldsymbol{x}}}\omega(\boldsymbol{x}-\boldsymbol{y})\left\{\sum_{k=1}^{m}\lambda_{k}\boldsymbol{u}_{k}\big{(}\iota(\boldsymbol{x})\big{)}\boldsymbol{u}_{k}\big{(}\iota(\boldsymbol{y})\big{)}\right\}\boldsymbol{f}(\boldsymbol{y}).

\sum_{\boldsymbol{y}\in W_{\boldsymbol{x}}}\omega(\boldsymbol{x}-\boldsymbol{y})\left\{\sum_{k=1}^{m}\lambda_{k}\boldsymbol{u}_{k}\big{(}\iota(\boldsymbol{x})\big{)}\boldsymbol{u}_{k}\big{(}\iota(\boldsymbol{y})\big{)}\right\}\boldsymbol{f}(\boldsymbol{y}).

\sum_{k=1}^{m}\lambda_{k}\boldsymbol{u}_{k}\big{(}\iota(\boldsymbol{x})\big{)}(\omega\ast\boldsymbol{h}_{k})(\boldsymbol{x}),

\sum_{k=1}^{m}\lambda_{k}\boldsymbol{u}_{k}\big{(}\iota(\boldsymbol{x})\big{)}(\omega\ast\boldsymbol{h}_{k})(\boldsymbol{x}),

K = k = 1 \sum m_{0} α_{k} v_{k} v_{k}^{⊤},

K = k = 1 \sum m_{0} α_{k} v_{k} v_{k}^{⊤},

\mathbf{A}(i,j)=\kappa(\boldsymbol{\mu}_{i},\boldsymbol{\mu}_{j})\qquad\big{(}i,j\in[1,m_{0}]\big{)}.

\mathbf{A}(i,j)=\kappa(\boldsymbol{\mu}_{i},\boldsymbol{\mu}_{j})\qquad\big{(}i,j\in[1,m_{0}]\big{)}.

A = k = 1 \sum m_{0} α_{k} w_{k} w_{k}^{⊤},

A = k = 1 \sum m_{0} α_{k} w_{k} w_{k}^{⊤},

B (i, j) = κ (μ_{i}, r_{j}),

B (i, j) = κ (μ_{i}, r_{j}),

\boldsymbol{v}_{k}=\frac{1}{\alpha_{k}}\mathbf{B}^{\top}\!\boldsymbol{w}_{k}\qquad\big{(}k\in[1,m_{0}]\big{)}.

\boldsymbol{v}_{k}=\frac{1}{\alpha_{k}}\mathbf{B}^{\top}\!\boldsymbol{w}_{k}\qquad\big{(}k\in[1,m_{0}]\big{)}.

e = i = 1 \sum m ∥ r_{i} - μ_{c (i)} ∥^{2}

e = i = 1 \sum m ∥ r_{i} - μ_{c (i)} ∥^{2}

{\big{(}\kappa(\boldsymbol{x},\boldsymbol{y})-\kappa(\boldsymbol{w},\boldsymbol{z})\big{)}}^{2}\leq L\big{(}{\lVert(\boldsymbol{x}-\boldsymbol{w})\rVert}^{2}+{\lVert(\boldsymbol{y}-\boldsymbol{z})\rVert}^{2}\big{)}.

{\big{(}\kappa(\boldsymbol{x},\boldsymbol{y})-\kappa(\boldsymbol{w},\boldsymbol{z})\big{)}}^{2}\leq L\big{(}{\lVert(\boldsymbol{x}-\boldsymbol{w})\rVert}^{2}+{\lVert(\boldsymbol{y}-\boldsymbol{z})\rVert}^{2}\big{)}.

∥ K - K ∥_{F} \leq c_{1} e + c_{2} e,

∥ K - K ∥_{F} \leq c_{1} e + c_{2} e,

\hat{\boldsymbol{g}}(\boldsymbol{x})=\frac{1}{\hat{\eta}(\boldsymbol{x})}\sum_{k=1}^{m_{0}}\alpha_{k}\boldsymbol{v}_{k}\big{(}\iota(\boldsymbol{x})\big{)}(\omega\ast\boldsymbol{h}_{k})(\boldsymbol{x}),

\hat{\boldsymbol{g}}(\boldsymbol{x})=\frac{1}{\hat{\eta}(\boldsymbol{x})}\sum_{k=1}^{m_{0}}\alpha_{k}\boldsymbol{v}_{k}\big{(}\iota(\boldsymbol{x})\big{)}(\omega\ast\boldsymbol{h}_{k})(\boldsymbol{x}),

\hat{\eta}(\boldsymbol{x})=\sum_{k=1}^{m_{0}}\alpha_{k}\boldsymbol{v}_{k}\big{(}\iota(\boldsymbol{x})\big{)}(\omega\ast d_{k})(\boldsymbol{x}),

\hat{\eta}(\boldsymbol{x})=\sum_{k=1}^{m_{0}}\alpha_{k}\boldsymbol{v}_{k}\big{(}\iota(\boldsymbol{x})\big{)}(\omega\ast d_{k})(\boldsymbol{x}),

∥ \hat{g} - g ∥_{\infty} = x \in Ω max ∥ \hat{g} (x) - g (x)∥ \leq C_{1} e + C_{2} e,

∥ \hat{g} - g ∥_{\infty} = x \in Ω max ∥ \hat{g} (x) - g (x)∥ \leq C_{1} e + C_{2} e,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Fast High-Dimensional Kernel Filtering

Pravin Nair, and Kunal N. Chaudhury

Abstract

The bilateral and nonlocal means filters are instances of kernel-based filters that are popularly used in image processing. It was recently shown that fast and accurate bilateral filtering of grayscale images can be performed using a low-rank approximation of the kernel matrix. More specifically, based on the eigendecomposition of the kernel matrix, the overall filtering was approximated using spatial convolutions, for which efficient algorithms are available. Unfortunately, this technique cannot be scaled to high-dimensional data such as color and hyperspectral images. This is simply because one needs to compute/store a large matrix and perform its eigendecomposition in this case. We show how this problem can be solved using the Nystr $\ddot{\text{o}}$ m method, which is generally used for approximating the eigendecomposition of large matrices. The resulting algorithm can also be used for nonlocal means filtering. We demonstrate the effectiveness of our proposal for bilateral and nonlocal means filtering of color and hyperspectral images. In particular, our method is shown to be competitive with state-of-the-art fast algorithms, and moreover it comes with a theoretical guarantee on the approximation error.

Index Terms:

Kernel Filter, Nystr $\ddot{\text{o}}$ m Method, Approximation, Fast Algorithm, Error Bound.

I Introduction

The bilateral and nonlocal means filters [1, 2] are widely used for edge-preserving smoothing and denoising of images [3, 4]. These are instances of kernel filters, where the similarity (affinity) between pixels is measured using a symmetric kernel. We refer the reader to [4] for an excellent review of kernel filters. While they have proven to be useful in practice, a flip side of kernel filtering, including bilateral filtering (BLF) and nonlocal means (NLM), is their computational complexity [3]. Nevertheless, several fast algorithms have been proposed, e.g. [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26], which can speed up BLF and NLM, without compromising their filtering quality. See [21, 11, 26] for a survey of these algorithms. Unfortunately, most algorithms only work with grayscale images, and cannot be extended to color, multispectral, and hyperspectral images.

Algorithms for fast BLF of color images have been proposed in [12, 21, 27, 28, 29]. However, to the best of our knowledge, these methods have not been extended for multispectral and hyperspectral images. Fast algorithms for generic high-dimensional BLF and NLM have been proposed in [18, 19, 20, 30]. A common feature of these algorithms is that they use data clustering or tessellation in high-dimensions. The state-of-the-art fast algorithms for color BLF are [19, 21], and for color NLM is [20].

More recently, it was shown in [15, 17] that fast BLF of grayscale images can be performed using the partial eigendecomposition of the kernel matrix. In fact, the interpretation of BLF (and NLM) as kernel filters goes back to [31, 32, 33]. While the Nystr $\ddot{\text{o}}$ m method has widely been used in machine learning [34, 35, 36], it appears that [31] is the first to apply this for image filtering. Note that, unlike [15, 17], the spatial and range kernel are treated as a single kernel in [31, 32, 33].

The differences between our and related approaches are:

$\bullet$ As explained in detail in §II, it is difficult to scale [15, 17] for filtering high-dimensional (even color) images, since one needs to populate a huge kernel matrix and compute its eigendecomposition. We propose to use the Nystr $\ddot{\text{o}}$ m method to solve this problem. As a result, we are able to perform BLF and NLM of color and hyperspectral images.

$\bullet$ The first difference with [31, 32, 33] is that we use clustering instead of uniform sampling for the Nystr $\ddot{\text{o}}$ m approximation. A significant improvement in filtering accuracy is achieved as a result. The other difference is that if a spatial kernel has to incorporated in [31, 32, 33], then the Nystr $\ddot{\text{o}}$ m approximation needs to be performed in the spatio-range space. However, we handle the spatial and range components differently—fast convolutions are used for the spatial component and Nystr $\ddot{\text{o}}$ m approximation is used for the range component. As a result, we require lesser samples for the Nystr $\ddot{\text{o}}$ m approximation.

$\bullet$ In [28, 29], clustering is used to compute “intermediate” images, which are interpolated to get the final output. On the other hand, clustering is used in our method just to obtain the “landmark points” for the Nystr $\ddot{\text{o}}$ m approximation.

$\bullet$ Compared to [18, 19, 20, 21], our algorithm is conceptually simple and easy to implement. Moreover, we are able to derive a bound on the filtering error incurred by the approximation. Such a guarantee is not offered by [18, 19, 20, 21].

The rest of the paper is organized as follows. In §II, we introduce the notion of kernel filtering, and explain the core problem in relation to the spectral approximations in [15, 17]. We use the Nystr $\ddot{\text{o}}$ m method in §III to overcome this problem. Numerical results are reported in §IV and we conclude in §V.

II Background

We begin by formulating BLF and NLM as kernel filters [4]. Suppose the input image is $\boldsymbol{f}:\Omega\to[0,R]^{n}$ , where $\Omega\subset\mathbb{Z}^{d}$ is the spatial domain, $[0,R]^{n}$ is the range space, and $d$ (resp. $n$ ) is the dimension of the domain (resp. range). Let $\boldsymbol{p}:\Omega\to[0,R]^{\rho}$ be the guide image, which is used to control the filtering. For standard BLF, $\boldsymbol{f}$ and $\boldsymbol{p}$ are identical, and $n=\rho=1$ and $3$ for grayscale and color images. However, $\boldsymbol{f}$ and $\boldsymbol{p}$ (also $n$ and $\rho$ ) can be different for joint BLF [3]. For NLM, $\rho$ is generally larger than $n$ , where $\rho$ is the number of pixels in a patch [2]. Let $\kappa:\mathbb{R}^{\rho}\times\mathbb{R}^{\rho}\to\mathbb{R}$ be the range kernel. The filtered output $\boldsymbol{g}:\Omega\to[0,R]^{n}$ is given by

[TABLE]

where $W_{\boldsymbol{x}}$ is a square window around $\boldsymbol{x}\in\Omega$ consisting of $(2S+1)^{d}$ pixels, with $S$ being the window radius. The spatial kernel $\omega:\mathbb{Z}^{d}\to\mathbb{R}$ controls the weighting of the neighboring pixels involved in the averaging. At this point, we just assume that $\kappa$ is symmetric, i.e., $\kappa(\boldsymbol{t},\boldsymbol{s})=\kappa(\boldsymbol{s},\boldsymbol{t})$ for $\boldsymbol{t},\boldsymbol{s}\in\mathbb{R}^{\rho}$ . For example, $\kappa(\boldsymbol{t},\boldsymbol{s})=\exp(-\theta\lVert\boldsymbol{s}-\boldsymbol{t}\rVert^{2}),\theta>0$ , for standard BLF and NLM, where $\lVert\cdot\rVert$ is the Euclidean norm.

It was shown in [15, 17] that the non-linear operations in (1) can be computed using convolutions by approximating $\kappa$ . For convenience, we will describe this using our notations. Let the actual range of $\boldsymbol{p}$ be

[TABLE]

We emphasize that $\mathfrak{R}$ is a list and not a set, i.e., we allow repetition of elements in $\mathfrak{R}$ . In particular, let $\mathfrak{R}=\{\boldsymbol{r}_{1},\boldsymbol{r}_{2},....,\boldsymbol{r}_{m}\}$ be some ordering of the elements in $\mathfrak{R}$ , where $m$ is the number of elements. This means that, given $\ell\in[1,m]$ , $\boldsymbol{r}_{\ell}=\boldsymbol{p}(\boldsymbol{x})$ for some $\boldsymbol{x}\in\Omega$ . We track this correspondence using the index map $\iota:\Omega\to[1,m]$ , where

[TABLE]

We next define the kernel matrix $\mathbf{K}\in\mathbb{R}^{m\times m}$ given by

[TABLE]

In terms of (4), we can write (1) as

[TABLE]

It is clear from (4) that $\mathbf{K}$ is symmetric. In particular, let the eigendecomposition of $\mathbf{K}$ be

[TABLE]

where $\lambda_{1},\ldots,\lambda_{m}\in\mathbb{R}$ are its eigenvalues, and $\boldsymbol{u}_{1},\ldots,\boldsymbol{u}_{m}\in\mathbb{R}^{m}$ are the corresponding eigenvectors. Substituting (6) in (5), we can write its numerator as

[TABLE]

On switching the sums, this becomes

[TABLE]

where $\omega\ast\boldsymbol{h}_{k}$ denotes the convolution of the image $\boldsymbol{h}_{k}(\boldsymbol{x})=\boldsymbol{u}_{k}\big{(}\iota(\boldsymbol{x})\big{)}\boldsymbol{f}(\boldsymbol{x})$ with $\omega$ . An identical argument applies for the denominator. In summary, we can compute (5) using convolutions, for which several efficient algorithms are available [37, 38]. Moreover, by considering just the largest eigenvalues, fast and accurate approximations can be obtained [15, 17].

Unfortunately, computing the full kernel and its eigendecomposition becomes prohibitively expensive when $\rho$ is large. Just as an example, consider an $8$ -bit color image for which $R=255$ and $\rho=3$ . Even if we assume that $m$ is just $10\%$ of the maximum range cardinality ( $=256^{3}$ ), we will still need to populate a $3\text{ million}\times 3\text{ million}$ matrix, and compute its eigenvalues. The situation is worse for hyperspectral images, where $\rho$ is of the order of tens, or even hundreds.

III Proposed Method

Originally, the Nystr $\ddot{\text{o}}$ m method was used for approximating the solution of functional eigenvalue problems [39, 40]. The method has found useful applications in machine learning and computer vision for approximating the eigendecomposition of large matrices [34, 35, 31]. In the present context, the goal is to approximate (6) using a decomposition of the form

[TABLE]

where $\alpha_{k}\in\mathbb{R}$ and $\boldsymbol{v}_{k}\in\mathbb{R}^{m}$ . Clearly, the rank of $\widehat{\mathbf{K}}$ is at most $m_{0}$ . Thus, for small $m_{0}$ , $\widehat{\mathbf{K}}$ is a low-rank approximation of $\mathbf{K}$ . A large $m_{0}$ results in a better approximation, but at higher computational cost. In practice, a good tradeoff is required.

The original kernel $\mathbf{K}$ is defined on $\mathfrak{R}$ . In the Nystr $\ddot{\text{o}}$ m method [39, 40], we first construct a smaller kernel $\mathbf{A}$ , compute its eigendecomposition, and then “extrapolate” the eigenvectors of $\mathbf{A}$ to approximate those of $\mathbf{K}$ . More precisely, we pick few landmarks points from $\mathfrak{R}$ , say, $\mathfrak{R}_{0}=\{\boldsymbol{\mu}_{1},\ldots,\boldsymbol{\mu}_{m_{0}}\}$ , and define a kernel $\mathbf{A}\in\mathbb{R}^{m_{0}\times m_{0}}$ on $\mathfrak{R}_{0}$ :

[TABLE]

Clearly, $\mathbf{A}$ is symmetric, and its size is much smaller than $\mathbf{K}$ . Thus, we can efficiently compute its eigendecomposition:

[TABLE]

where $\alpha_{k}\in\mathbb{R}$ and $\boldsymbol{w}_{k}\in\mathbb{R}^{m_{0}}$ . We next construct $\mathbf{B}\in\mathbb{R}^{m_{0}\times m}$ on $\mathfrak{R}_{0}\times\mathfrak{R}$ given by

[TABLE]

where $i\in[1,m_{0}]$ and $j\in[1,m]$ . This captures the kernel values between the points in $\mathfrak{R}$ and the landmark points. This matrix is used to extrapolate $\boldsymbol{w}_{k}$ as follows:

[TABLE]

This completes the specification of $\alpha_{k}$ and $\boldsymbol{v}_{k}$ in (8). We refer the reader to [35] for the intuition behind the approximation. The effective speedup of replacing (6) by (8) is $\mathcal{O}(m/m_{0})^{3}$ . This is because the complexity of eigendecomposition of a $k\times k$ matrix is $\mathcal{O}(k^{3})$ [41]. In particular, the speedup is significant since $m_{0}\ll m$ . As will be evident shortly, we just need to compute $(\alpha_{k})$ and $(\boldsymbol{v}_{k})$ ; we will not use $\widehat{\mathbf{K}}$ explicitly.

Following [36], we select the landmark points by clustering $\mathfrak{R}$ . More specifically, we partition $\mathfrak{R}$ into $m_{0}$ disjoint sets using $k$ -means clustering, and take the centroids to be the landmarks. Note that, though $\mathfrak{R}_{0}$ is not guaranteed to be a subset of $\mathfrak{R}$ , we can still apply the above approximation.

It was shown in [36] that the kernel error can be bounded by the quantization error. More specifically, let $\|\mathbf{K}-\widehat{\mathbf{K}}\rVert_{\text{F}}$ be the kernel error ( $\lVert\cdot\rVert_{\text{F}}$ is the Frobenius norm), and let

[TABLE]

be the quantization error, where $c(i)$ is the minimizer of $\lVert\boldsymbol{r}_{i}-\boldsymbol{\mu}_{j}\rVert$ over $j\in[1,m_{0}]$ . Then the following bound holds [36].

Proposition 1

Suppose there exists some $L>0$ such that, for $\boldsymbol{w},\boldsymbol{x},\boldsymbol{y},\boldsymbol{z}\in\mathfrak{R}$ ,

[TABLE]

Then the approximation error can be bounded as

[TABLE]

where the positive constants $c_{1}$ and $c_{2}$ do not depend on $e$ . In particular, (13) holds when $\kappa$ is a Gaussian.

Proposition 1 suggests that we can reduce the kernel error by making $e$ small. However, $e$ measures how well $\Theta$ is represented by the landmark points. Following this observation, $k$ -means clustering was used in [36] for determining the landmarks. It was empirically shown in [36] that clustering indeed results in smaller error over uniform sampling [35, 31]. We will see that this is also true for our algorithm.

We arrive at a fast algorithm by replacing $\mathbf{K}$ by $\widehat{\mathbf{K}}$ . It is clear from (7) that the resulting approximation is given by

[TABLE]

where $d_{k}:\Omega\to\mathbb{R}$ and $\boldsymbol{h}_{k}:\Omega\to\mathbb{R}^{n}$ are defined as $d_{k}(\boldsymbol{x})=\boldsymbol{v}_{k}(\iota(\boldsymbol{x}))$ and $\boldsymbol{h}_{k}(\boldsymbol{x})=d_{k}(\boldsymbol{x})\boldsymbol{f}(\boldsymbol{x})$ .

The computation of (14) and (15) involves $(n+1)m_{0}$ convolutions, since for each $k\in[1,m_{0}]$ , there are $n$ convolutions in (14) and one in (15). The main point is that we have been able to express the non-linear kernel filter using convolutions, for which efficient algorithms are available. In particular, (14) and (15) can be performed using $\mathcal{O}(1)$ operations (w.r.t. the size of the spatial kernel), when $\omega$ is a box or Gaussian [42, 37, 38]. The overall algorithm is described in Algorithm 1 (source code in [43]), where the symbols $\oplus,\otimes$ and $\oslash$ are used to denote pixelwise addition, multiplication, and division. The complexity of $k$ -means clustering and the eigendecomposition of $\mathbf{A}$ are $\mathcal{O}(|\Omega|m_{0}\rho)$ [44] and $\mathcal{O}({m_{0}}^{3})$ [41]. On the other hand, the complexity of the convolutions in (14) and (15) is $\mathcal{O}(|\Omega|m_{0}(n+\rho))$ , where $|\Omega|$ is the number of pixels. Since the complexity of the brute-force implementation is $\mathcal{O}\big{(}|\Omega|(2S+1)^{d}(n+\rho)\big{)}$ [3], and convolutions are the dominant operations in our algorithm, we obtain an effective speedup of $(2S+1)^{d}/m_{0}$ . This is significant as $S$ is typically large [3].

We now comment on the filtering accuracy, namely, how well is (1) approximated by (14). Intuitively, we expect the approximation to be accurate if $\widehat{\mathbf{K}}\approx\mathbf{K}$ . In fact, since the difference $\|\mathbf{K}-\widehat{\mathbf{K}}\rVert_{\mathrm{F}}$ is controlled by the quantization error (Proposition 1), we have the following result.

Theorem 2

Suppose $\omega$ and $\kappa$ are positive, and $\kappa$ satisfies the property in Proposition 1. Then

[TABLE]

where $C_{1},C_{2}>0$ do not depend on $e$ .

The main steps of the derivation are given in the supplement. Theorem 2 is true for BLF and NLM, where $\kappa$ is a Gaussian. A practical implication of this result is that the filtering accuracy is guaranteed to increase with $m_{0}$ (Figure $4$ in the supplement). Deriving a similar bound is difficult for [18, 19, 20, 21].

IV Results

We demonstrate the effectiveness of our algorithm for BLF and NLM of high-dimensional images by comparing it with state-of-the-art algorithms. Instead of standard NLM [2], we have used PCA-NLM [45], where the denoising performance of the former is improved by applying PCA on the collection of patches. As for the dataset, we have used the color images from [46] and the hyperspectral images from [47]. Experiments were performed using Matlab on a $3.4$ GHz quad-core machine with $32$ GB memory. The spatial kernel $\omega$ for BLF is a Gaussian (covariance $\sigma^{2}\mathbf{I}$ and $S=3\sigma$ ), while it is a box in PCA-NLM. The range kernel $\kappa$ is Gaussian (covariance $\theta^{2}\mathbf{I}$ ) for both BLF and PCA-NLM. We have used the fast $\mathcal{O}(1)$ algorithm in [37] when $\omega$ is a Gaussian, and the Matlab routine “imfilter” when $\omega$ is a box. Note that we can also use other fast Gaussian filters [42, 38] if higher accuracy is desired.

Color BLF. The state-of-the-art fast algorithms for color BLF are Adaptive Manifolds (AM) [20], Permutohedral Lattice (PL) [19], and Global Color Sparseness (GCS) [21]. We have compared with them in Figure 2. The number of manifolds is set automatically in AM, whereas we have used $15$ clusters in GCS and for the Nystr $\ddot{\text{o}}$ m approximation. Following [20, 21], we used PSNR to measure the error between the brute-force and fast implementations. In Figure 2, notice that while our PSNR marginally exceeds that of GCS, it is however much better than PL and AM. Also notice the significant acceleration over the brute-force implementation obtained using our algorithm. We have also provided a table comparing the different methods on the Kodak dataset [46] in the supplement. The table shows that our method is better than GCS and PL when $\theta>40$ . As claimed in the introduction, we can see from the table that clustering provides a significant boost in filtering accuracy ( $10\mbox{-}20$ dB) over uniform sampling.

Color NLM. AM is the state-of-the-art fast algorithm for color NLM (and PCA-NLM). In NLM, $\rho=3(2r+1)^{2}$ , where $r$ is the patch radius [2]. On the other hand, $\rho$ is reduced to a smaller value in PCA-NLM using PCA. Following [45], we set $\theta$ to be three times the noise level for all experiments. Denoising results are shown in Figure 1, where $S=10$ and $r=3$ . For (b), (c), and (d), PCA was used to reduce the range dimension from $3\times 7^{2}$ to $25$ . We used $31$ clusters (resp. manifolds) for the Nystr $\ddot{\text{o}}$ m approximation (resp. AM). Following [50], we measured the denoising performance using PSNR and SSIM (between the clean and denoised images). Note that we are superior to AM both in terms of accuracy and timing. Importantly, our PSNR is close to PCA-NLM (the method being approximated), but we are about $160\times$ faster. In comparison with BM3D [51], our PSNR is $3$ dB less. However, our timing is about half that of BM3D, since our complexity is much less than that of BM3D. Additional visual comparisons and accuracy analysis is provided in the supplement.

Hyperspectral BLF. Finally, we present a denoising result for a hyperspectral image of size $(610\times 340)\times 103$ bands using BLF ( $\sigma=3,\theta=100$ ). We have also compared with state-of-the-art methods for hyperspectral denoising [48, 49], whose parameters have been tuned accordingly. The results are shown in Figure 3. We have used $m_{0}=32$ landmarks for the Nystr $\ddot{\text{o}}$ m approximation. As a standard practice, the $\mathrm{PSNR}$ and SSIM values are averaged over the spectral bands. Notice that our method can restore details better, which results in higher PSNR/SSIM values. In particular, the color is not satisfactorily restored in [48] and grains can be seen in [49]. Being a one-shot method, we are much faster than [48, 49].

V Conclusion

We showed that fast bilateral and nonlocal means filtering of high-dimensional images can be performed using the Nystr $\ddot{\text{o}}$ m approximation. The proposed algorithm can significantly accelerate the brute-force implementation of these filters, without compromising the visual quality. In particular, our algorithm is often competitive with state-of-the-art fast algorithms, and comes with provable guarantee on the filtering accuracy.

Bibliography51

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” Proc. IEEE International Conference on Computer Vision , pp. 839–846, 1998.
2[2] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” Proc. IEEE Conference on Computer Vision and Pattern Recognition , vol. 2, pp. 60–65, 2005.
3[3] S. Paris, P. Kornprobst, J. Tumblin, and F. Durand, “Bilateral filtering: Theory and Applications,” Foundations and Trends® in Computer Graphics and Vision , vol. 4, no. 1, pp. 1–73, 2009.
4[4] P. Milanfar, “A tour of modern image filtering: New insights and methods, both practical and theoretical,” IEEE Signal Processing Magazine , vol. 30, no. 1, pp. 106–128, 2013.
5[5] F. Durand and J. Dorsey, “Fast bilateral filtering for the display of high-dynamic-range images,” ACM Transactions on Graphics , vol. 21, no. 3, pp. 257–266, 2002.
6[6] S. Paris and F. Durand, “A fast approximation of the bilateral filter using a signal processing approach,” Proc. European Conference on Computer Vision , pp. 568–580, 2006.
7[7] J. Chen, S. Paris, and F. Durand, “Real-time edge-aware image processing with the bilateral grid,” ACM Transactions on Graphics , vol. 26, no. 3, p. 103, 2007.
8[8] F. Porikli, “Constant time O(1) bilateral filtering,” Proc. IEEE Conference on Computer Vision and Pattern Recognition , pp. 1–8, 2008.