Statistical learning of rational wavelet transform for natural images

Naushad Ansari; Anubha Gupta

arXiv:1705.00821·cs.CV·May 3, 2017

Statistical learning of rational wavelet transform for natural images

Naushad Ansari, Anubha Gupta

PDF

TL;DR

This paper introduces a statistical learning method for rational wavelet transforms tailored for natural images, demonstrating improved performance in compressed sensing reconstruction over standard wavelet transforms.

Contribution

It proposes a novel Rational Wavelet Transform Learning in Statistical sense (RWLS) method using a lifting framework with a closed form solution.

Findings

01

RWLS outperforms standard dyadic wavelet transforms in image reconstruction

02

The method is effective for compressed sensing applications

03

Closed form solution simplifies the learning process

Abstract

Motivated with the concept of transform learning and the utility of rational wavelet transform in audio and speech processing, this paper proposes Rational Wavelet Transform Learning in Statistical sense (RWLS) for natural images. The proposed RWLS design is carried out via lifting framework and is shown to have a closed form solution. The efficacy of the learned transform is demonstrated in the application of compressed sensing (CS) based reconstruction. The learned RWLS is observed to perform better than the existing standard dyadic wavelet transforms.

Figures40

Click any figure to enlarge with its caption.

Equations41

G_{h}^{n e w} (z) = G_{h} (z) - G_{l} (z) T (z^{2}) .

G_{h}^{n e w} (z) = G_{h} (z) - G_{l} (z) T (z^{2}) .

F_{l}^{n e w} (z) = F_{l} (z) + F_{h} (z) T (z^{2}) .

F_{l}^{n e w} (z) = F_{l} (z) + F_{h} (z) T (z^{2}) .

G_{l}^{n e w} (z) = G_{l} (z) + G_{h} (z) S (z^{2}) .

G_{l}^{n e w} (z) = G_{l} (z) + G_{h} (z) S (z^{2}) .

F_{h}^{n e w} (z) = F_{h} (z) - F_{l} (z) S (z^{2}) .

F_{h}^{n e w} (z) = F_{h} (z) - F_{l} (z) S (z^{2}) .

G_{l} (z) = G_{0} (z^{2}) + z^{3} G_{1} (z^{2}) .

G_{l} (z) = G_{0} (z^{2}) + z^{3} G_{1} (z^{2}) .

F_{l} (z) = F_{0} (z^{2}) + z^{- 3} F_{1} (z^{2}),

F_{l} (z) = F_{0} (z^{2}) + z^{- 3} F_{1} (z^{2}),

r_{B}^{H} [n_{1}, n_{2}] = \frac{σ _{H}^{2}}{2} (∣ n_{1} ∣^{2 H} - ∣ n_{1} - n_{2} ∣^{2 H} + ∣ n_{2} ∣^{2 H}),

r_{B}^{H} [n_{1}, n_{2}] = \frac{σ _{H}^{2}}{2} (∣ n_{1} ∣^{2 H} - ∣ n_{1} - n_{2} ∣^{2 H} + ∣ n_{2} ∣^{2 H}),

G_{i} (z) = z^{i}, i = 0, 1, 2

G_{i} (z) = z^{i}, i = 0, 1, 2

F_{i} (z) = z^{- i}, i = 0, 1, 2.

a [n] = {x [\frac{3 n}{2}] x [\frac{3 n - 1}{2}] if n is a even if n is a odd,

a [n] = {x [\frac{3 n}{2}] x [\frac{3 n - 1}{2}] if n is a even if n is a odd,

d [n] = x [3 n + 2]

d [n] = x [3 n + 2]

T (z) = t_{0} z + t_{1} z^{2}, \vspace - 0.8 e m

T (z) = t_{0} z + t_{1} z^{2}, \vspace - 0.8 e m

d^{n e w} [n]

d^{n e w} [n]

= x [3 n + 2] - t_{0} x [3 n + 1] - t_{1} x [3 n + 3] .

e [n] = d^{n e w} [n] = x [3 n + 2] - t_{0} x [3 n + 1] - t_{1} x [3 n + 3] .

e [n] = d^{n e w} [n] = x [3 n + 2] - t_{0} x [3 n + 1] - t_{1} x [3 n + 3] .

ζ [n] = E (e^{2} [n]) = E ({x [3 n + 2] - t_{0} x [3 n + 1] - t_{1} x [3 n + 3]}^{2}),

ζ [n] = E (e^{2} [n]) = E ({x [3 n + 2] - t_{0} x [3 n + 1] - t_{1} x [3 n + 3]}^{2}),

\frac{\partial ζ}{\partial t} =

\frac{\partial ζ}{\partial t} =

⟹

\vspace - 1 e m G_{h}^{n e w} (z) = G_{h} (z) - k = 0 \sum 1 G_{l} (z^{\frac{1}{2}} W_{2}^{2 k}) T (z^{\frac{3}{2} W_{2}^{3 k}}),

\vspace - 1 e m G_{h}^{n e w} (z) = G_{h} (z) - k = 0 \sum 1 G_{l} (z^{\frac{1}{2}} W_{2}^{2 k}) T (z^{\frac{3}{2} W_{2}^{3 k}}),

\vspace - 0.5 e m R (z) E (z) = c z^{- n_{0}} I,

\vspace - 0.5 e m R (z) E (z) = c z^{- n_{0}} I,

\tilde{s} = s min ∣∣ x - x_{u} ∣ ∣^{2}, \vspace - 1 e m

\tilde{s} = s min ∣∣ x - x_{u} ∣ ∣^{2}, \vspace - 1 e m

G_{l}^{n e w} (z) = G_{l} (z) + G_{h} (z^{2}) S (z^{3})

G_{l}^{n e w} (z) = G_{l} (z) + G_{h} (z^{2}) S (z^{3})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Statistical Learning of Rational Wavelet Transform for Natural Images

Abstract

Motivated with the concept of transform learning and the utility of rational wavelet transform in audio and speech processing, this paper proposes Rational Wavelet Transform Learning in Statistical sense (RWLS) for natural images. The proposed RWLS design is carried out via lifting framework and is shown to have a closed form solution. The efficacy of the learned transform is demonstrated in the application of compressed sensing (CS) based reconstruction. The learned RWLS is observed to perform better than the existing standard dyadic wavelet transforms.

**Index Terms— ** Rational Wavelet, Statistically Matched Wavelet, Natural Images, Lifting Framework

1 Introduction

Transform learning (TL) is an active research area where the sparsifying transform along with the transform domain signal are learned using some constraints for a class of signals. Currently, TL is being used in several applications including image/video denoising and MRI reconstruction [1, 2, 3]. While TL is being used actively, non-convexity of the problem having no closed form solution makes it difficult to solve. Hence, greedy algorithms are used to solve TL problem.

Among existing transforms, discrete wavelet transform (DWT) is widely used in applications because of its ability of efficient signal representation [4]. Non-uniqueness of wavelet basis motivates one to learn wavelet transform in applications. Wavelet transform learning can be viewed as a specific case of transform learning. Since the integer translates of the associated wavelet filters form the basis in $l_{2}$ -space, wavelet transform learning corresponds to learning of wavelet filters.

Generally, dyadic wavelet transform is used that decomposes input signal spectrum into two uniform frequency bands via two-channel filterbank. On the other hand, rational wavelet transform (RWT) provides non-uniform frequency band representation of signal spectrum that is seen to be useful in some applications [5, 6]. RWT has also been used in pattern recognition [7] and feature extraction [8]. Although methods for RWT designs have been presented in the literature [9, 10, 11, 12], designed wavelets are independent of the signal of interest. Recently, a method has been proposed in [13] to learn rational wavelet deterministically from a given signal. Since [13] requires full signal, it cannot be used in inverse problems such as CS-based reconstruction where one does not have access to the full original signal.

This paper proposes rational wavelet learning for natural images. It has been shown that natural images can be modeled as fractional Brownian motion (fBm) processes in[14]. fBm processes are Gaussian non-stationary random processes with stationary increments that form a class of statistically self-similar processes [15] and have been used widely in image processing [16, 17].

The above discussion of transform learning, flexibility of rational wavelet transform, and modeling of natural images via fBm processes motivates us to learn rational wavelet transform for natural images in statistical sense. Specifically, statistics of a set of natural images are used to propose method for learning separable rational wavelet transform for this class of images. Lifting framework for rational wavelet introduced in [13] is utilized in the proposed work and is called the RWLS method. The proposed formulation leads to convex problem that can be solved by least squares making the RWLS method computationally efficient.

Following are the salient contributions of this work:

Statistical learning of rational separable wavelet transform for natural images is proposed. 2. 2.

Lifting framework, that is Digital Signal Processing (DSP) hardware friendly, is used in the proposed method making the learned transform easily implementable on hardware. 3. 3.

The proposed formulation leads to convex problem unlike conventional TL and can be solved easily. 4. 4.

The proposed RWLS is applied in compressed sensing based reconstruction and is observed to perform better than the existing dyadic wavelet transforms.

2 Brief Background

2.1 Lifting in Dyadic Wavelet

Lifting methodology supports customized wavelet design [18], [19]. This design is modular, guarantees perfect reconstruction, and allows non-linear filters to be part of the wavelet structures. A general lifting scheme consists of three steps: Split, Predict, and Update (Refer to Fig. 1). In the split step, given input signal is divided into even $x_{e}[n]$ and odd $x_{o}[n]$ indexed samples. The corresponding filterbank structure is called as the Lazy wavelet system [19] and is converted to the conventional wavelet system using successive predict and update stage filters as shown in Fig. 2 with analysis filters labeled as $G_{l}(z)=Z\{g_{l}[n]\}$ , $G_{h}(z)=Z\{g_{h}[n]\}$ and the synthesis filters as $F_{l}(z)=Z\{f_{l}[n]\}$ , $F_{h}(z)=Z\{f_{h}[n]\}$ .

In the Predict Lifting step, odd samples are predicted from the neighboring even samples using the predictor $P\equiv T(z)$ or vice-versa. This step modifies the analysis highpass filter and the synthesis lowpass filter as:

[TABLE]

The Update Lifting step modifies the analysis lowpass filter and the synthesis highpass filter. The update step filter is denoted with the symbol $U\equiv S(z)$ and the related equations are given as:

[TABLE]

2.2 Rational Wavelet

Let us consider Fig. 3(a) with 2-channel $(\frac{2}{3},\frac{1}{3})$ rational wavelet filterbank that can be converted into an equivalent uniformly decimated M-band structure (Fig. 3(b)). Filters $G_{0}(z)$ and $G_{1}(z)$ of Fig. 3(b) can be written as an equivalent filter $G_{l}(z)$ of Fig. 3(a) using the following equation:

[TABLE]

Similarly, synthesis filters $F_{0}(z)$ and $F_{1}(z)$ of Fig. 3(b) can be written as an equivalent filter $F_{l}(z)$ of Fig. 3(a) using the following equation:

[TABLE]

while other filters remain same, i.e., $G_{h}(z)=G_{2}(z)$ and $F_{h}(z)=F_{2}(z)$ .

2.3 Fractional Brownian Motion

Fractional Brownian motion $B_{H}(t)$ is a Gaussian, zero mean, self similar, non-stationary random process with stationary increments [20]. The auto-covariance of the corresponding discrete time process $B_{H}[n]$ is given by:

[TABLE]

where $\sigma_{H}^{2}=var(B_{H}[1])=\frac{1}{\Gamma(2H+1)|sin(\pi H)|}$ , and $H$ is the self-similarity index, also called as Hurst exponent. The statistical properties of fBm processes are completely characterized by the single parameter $H$ that can be estimated using the maximum likelihood estimation method presented in [21].

3 Proposed RWLS Learning Method

This section presents the proposed RWLS learning method on $(\frac{2}{3},\frac{1}{3})$ rational wavelet statistically matched to natural images. Learning of separable two-dimensional (2D) rational wavelet is presented that requires learning 1-D RWLS separately matched to the row space and the column space of natural images. The proposed strategy is identical on either the row or the column space. For the sake of readers’ ease, let us first consider design for the column space.

3.1 Proposed Learning for the Column Space

Consider the initial architecture of uniformly decimated 3-band Lazy wavelet with filters (in Fig. 3(b)):

[TABLE]

This Lazy wavelet is subsequently transformed to equivalent $(\frac{2}{3},\frac{1}{3})$ rational wavelet via (5) and (6). On feeding the vectorized column form of collection of natural images, labeled as $x[n]$ , through this rational Lazy wavelet filterbank, following approximate $a[n]$ and detail $d[n]$ subband coefficients are obtained:

[TABLE]

Next, the lowpass and highpass filters of the Lazy rational wavelet structure are lifted via predict and update stage polynomial learned as explained in the following subsections.

3.1.1 Predict stage

We require to predict one branch of samples with the help of the other branch in the predict stage. In rational wavelet structure, this requires the concept of rate converter as proposed in [13] because the output sample rate of two branches is unequal (refer to Fig. 4). Considering the predict polynomial filter $T(z)$ as

[TABLE]

we obtain

[TABLE]

Thus, the choice of $T(z)$ in (11) allows $d[n]=x[3n+2]$ to be exactly predicted from the neighboring samples. These updated detail coefficients $d^{new}[n]$ can also be viewed as the error in predicting the lower branch samples. Hence, (3.1.1) is re-written as

[TABLE]

$T(z)$ is learned by minimizing the mean squared prediction error (mse) given by:

[TABLE]

where $E(.)$ denotes the expectation operator.

To minimize mse, mse vector $\bm{\zeta}$ is we differentiated with respect to $\mathbf{t}$ and is equated to zero as:

[TABLE]

Assuming that the input signal $x[n]$ , corresponding to the column space of natural images, belongs to an fBm process, $E[\mathbf{A}^{\prime}\mathbf{A}]$ and $E[\mathbf{A}^{\prime}\mathbf{b}]$ are computed using (7) and (3.1.1) is solved for $\mathbf{t}$ . On simplifying the structure of Fig. 4, the updated equivalent analysis highpass filter, using the learned predict filter $T(z)$ , can be written as:

[TABLE]

where $W_{k}=e^{-j\frac{2\pi}{k}}$ . For the update of the corresponding synthesis lowpass filter, the rational wavelet structure is converted to the equivalent $3$ -band structure and the polyphase matrix $\mathbf{E}(z)$ of the analysis side is computed using $G_{0}(z)$ , $G_{1}(z)$ and $G_{2}^{new}(z)$ . On applying the condition of perfect reconstruction [22] in (17), polyphase matrix $\mathbf{R}(z)$ of the synthesis side is computed.

[TABLE]

where $c\in\mathbb{R}$ , $n_{0}\in\mathbb{Z}$ , and $\mathbf{I}$ is $3\times 3$ identity matrix. From $\mathbf{R}(z)$ and (6), updated filter $F_{1}^{new}(z)$ of the rational wavelet is computed. This completes the predict stage.

3.1.2 Update Stage

Next, the update stage filter $S(z)$ shown in Fig.5 is learned. Again, rate converter, shown in Fig.5 as proposed in [13], is required. The reconstructed signal at the upper branch is shown as $x_{u}[n]$ . Since the natural images are generally rich in low frequency content, $x_{u}[n]$ should be as close as possible to the input signal $x[n]$ . This allows us to learn the update stage filter by minimizing the energy difference of the two signals as below:

[TABLE]

where $\textbf{s}\equiv S(z)=s_{0}+s_{1}z^{-2}$ . Signal $\mathbf{x}_{u}$ can be written in terms of update stage filter s that allows us to solve (18). Once $S(z)$ is learned, analysis lowpass filter is updated as:

[TABLE]

Synthesis highpass filter is updated using the method similar to the one used to update the synthesis lowpass filter. This completes the proposed learning. Since the lifting framework is modular, more predict and update stages can be appended to get longer length filterbanks. This is to note that for learning the RWT for the column space of natural images, we vectorized an ensemble of natural images column-wise and stacked them below each other to build a 1-D signal. Next, we estimate the Hurst exponent H of this column vector and learn the RWT as presented above.

3.2 Proposed Design for the Row Space

Corresponding to the row-space design, we vectorize all images row-wise and stack them to build a 1-D signal. Next, we estimate the Hurst exponent H of this row vector and learn the RWT using the method presented in the previous sub-section.

4 Application

The proposed RWLS method is applied on natural images as separable wavelets. The performance of the learned RWLS is compared with standard bi-orthogonal 5/3 and 9/7 wavelets in the application of compressive sensing based reconstruction of natural images of dimension $512\times 512$ . An ensemble of ten natural images shown in Fig. 6 is considered for learning the statistically matched rational wavelet structure for the row space and the column space of images. The value of Hurst exponent is observed to be between 0.5 to 1.0 for all the ten images considered. Fig. 7 shows the frequency response of the analysis side lowpass and highpass filters matched to the column space of natural images.

Bernoulli measurement matrix with entries taken as $\pm 1$ is considered in CS. Since it is computationally expensive to apply CS on big images, we use the concept of block CS [23], where block-size of $32\times 32$ is considered. Recently, multilevel wavelet decomposition has been proposed over L-shaped pyramid (L-Pyramid) (Fig. 8(b)) in [24] and is observed to perform better in CS application compared to the existing multilevel regular pyramid (R-Pyramid) wavelet decomposition (Fig. 8(a)). We decompose our input images to 3-level using this new L-Pyramid wavelet decomposition in our experiments. Table-I presents reconstruction results in terms of PSNR (peak signal to noise ratio) for sampling ratios varying from $90\%$ to $30\%$ , where sampling ratio is the percentage of total samples measured.

From Table-1, we note that the performance of the proposed RWLS is superior (comparable at 90% for Img1 and Img4) to standard wavelets on natural images. Although image ‘Img11’ was not used in the ensemble of images used to learn the RWT, the performance of the learned RWLS over this image is also superior indicating that the proposed learning indeed provides statistically-matched rational system for the class of natural images.

5 Conclusion

Statistical learning for rational wavelet transform (RWLS) method for natural images is presented in this work. The natural images are modeled as fBm processes and their statistical properties are used to learn separable rational wavelet transform. Lifting framework for the rational wavelet is used in the proposed work that provides closed form solution for learning making the method computationally efficient. The learned rational wavelet transform is tested in the application of CS based reconstruction of natural images and is observed to perform better compared to the existing standard bi-orthogonal wavelet transforms.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. Ravishankar, B. Wen, and Y. Bresler, “Online sparsifying transform learning—part i: Algorithms,” IEEE Journal of Selected Topics in Signal Processing , vol. 9, no. 4, pp. 625–636, 2015.
2[2] B. Wen, S. Ravishankar, and Y. Bresler, “Video denoising by online 3d sparsifying transform learning,” in Image Processing (ICIP), 2015 IEEE International Conference on . IEEE, 2015, pp. 118–122.
3[3] S. Ravishankar and Y. Bresler, “Efficient blind compressed sensing using sparsifying transforms with convergence guarantees and application to magnetic resonance imaging,” SIAM Journal on Imaging Sciences , vol. 8, no. 4, pp. 2519–2557, 2015.
4[4] S. Mallat, A wavelet tour of signal processing . Academic press, 1999.
5[5] T. Blu, “An iterated rational filter bank for audio coding,” in Time-Frequency and Time-Scale Analysis, 1996., Proceedings of the IEEE-SP International Symposium on . IEEE, 1996, pp. 81–84.
6[6] ——, “Iterated filter banks with rational rate changes connection with discrete wavelet transforms,” Signal Processing, IEEE Transactions on , vol. 41, no. 12, pp. 3232–3244, 1993.
7[7] O. Chertov, V. Malchykov, and D. Pavlov, “Non-dyadic wavelets for detection of some click-fraud attacks,” in Signals and Electronic Systems (ICSES), 2010 International Conference on . IEEE, 2010, pp. 401–404.
8[8] T.-T. Le, M. Ziebarth, T. Greiner, and M. Heizmann, “Optimized size-adaptive feature extraction based on content-matched rational wavelet filters,” in Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European . IEEE, 2014, pp. 1672–1676.