Regularized Residual Quantization: a multi-layer sparse dictionary   learning approach

Sohrab Ferdowsi; Slava Voloshynovskiy; Dimche Kostadinov

arXiv:1705.00522·cs.LG·May 2, 2017

Regularized Residual Quantization: a multi-layer sparse dictionary learning approach

Sohrab Ferdowsi, Slava Voloshynovskiy, Dimche Kostadinov

PDF

Open Access

TL;DR

This paper introduces Regularized Residual Quantization (RRQ), a multi-layer sparse dictionary learning method that improves high-dimensional data quantization and super-resolution tasks by incorporating variance regularization inspired by rate-distortion theory.

Contribution

The paper proposes a novel regularization technique for residual quantization, enabling scalable, sparse, multi-layer dictionaries suitable for high-dimensional data and natural images.

Findings

01

Efficient quantization of high-dimensional variance-decaying data.

02

Improved super-resolution results with quantized facial images.

03

Effective extension of residual quantization to many layers without overfitting.

Abstract

The Residual Quantization (RQ) framework is revisited where the quantization distortion is being successively reduced in multi-layers. Inspired by the reverse-water-filling paradigm in rate-distortion theory, an efficient regularization on the variances of the codewords is introduced which allows to extend the RQ for very large numbers of layers and also for high dimensional data, without getting over-trained. The proposed Regularized Residual Quantization (RRQ) results in multi-layer dictionaries which are additionally sparse, thanks to the soft-thresholding nature of the regularization when applied to variance-decaying data which can arise from de-correlating transformations applied to correlated data. Furthermore, we also propose a general-purpose pre-processing for natural images which makes them suitable for such quantization. The RRQ framework is first tested on synthetic…

Tables1

Table 1. TABLE I : Quantization distortion (normalized) on the train and test sets for K-means, random codeword generation (from 𝒩 ( 𝟎 , S ) 𝒩 0 𝑆 \mathcal{N}(\mathbf{0},S) ) and the VR-Kmeans algorithm (average over 5 trails). The theoretically minimum distortion (achieved at n → ∞ → 𝑛 n\rightarrow\infty ) is 0.9185 0.9185 0.9185 . Notice that K-means, while achieves the lowest distortion on the training set, fails to quantize the test set. VR-Kmeans with proper γ 𝛾 \gamma , on the other hand, performs the best on the test set.

	Kmeans	random generation	VR-Kmeans ( $λ = 0.1$ )	VR-Kmeans ( $λ = 10$ )	VR-Kmeans ( $λ = 1000$ )
distortion train	0.6727	0.9393	0.8441	0.8520	0.8568
distortion test	1.0054	0.9394	0.9413	0.9384	0.9390

Equations13

D_{j} = {γ, σ_{j}^{2}, if σ_{j}^{2} ⩾ γ, if σ_{j}^{2} < γ,

D_{j} = {γ, σ_{j}^{2}, if σ_{j}^{2} ⩾ γ, if σ_{j}^{2} < γ,

\sigma_{C_{j}}^{2}=\Big{(}\sigma_{j}^{2}-\gamma\Big{)}^{+}=\begin{cases}\sigma_{j}^{2}-\gamma,&\text{if }\sigma_{j}^{2}\geqslant\gamma,\\ 0,&\text{if }\sigma_{j}^{2}<\gamma.\end{cases}

\sigma_{C_{j}}^{2}=\Big{(}\sigma_{j}^{2}-\gamma\Big{)}^{+}=\begin{cases}\sigma_{j}^{2}-\gamma,&\text{if }\sigma_{j}^{2}\geqslant\gamma,\\ 0,&\text{if }\sigma_{j}^{2}<\gamma.\end{cases}

C, A minimize

C, A minimize

∣∣ α_{i} ∣ ∣_{0} = ∣∣ α_{i} ∣ ∣_{1} = 1.

C minimize Tr

C minimize Tr

+

c (j) minimize

c (j) minimize

+

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Image and Signal Denoising Methods · Image Processing Techniques and Applications

Full text

Regularized Residual Quantization: a multi-layer sparse dictionary learning approach

Sohrab Ferdowsi, Slava Voloshynovskiy, Dimche Kostadinov

Department of Computer Science, University of Geneva, Switzerland

$\{$ Sohrab.Ferdowsi, svolos, Dimche.Kostadinov $\}$ @unige.ch

I Introduction

Quantizing the residual errors from a previous level of quantization has been considered in signal processing for different applications, e.g., image coding. This problem was extensively studied in the 80’s and 90’s (e.g., see [1] and [4]). However, due to strong over-fitting, its efficiency was limited for more modern applications with larger scales. In practice, it was not feasible to train more than a couple of layers. Particularly at high dimensions, the codebooks learned on a training set were not able to quantize a statistically similar test set.

Inspired by an insight from rate-distortion theory, we introduce an effective regularization for the framework of Residual Quantization (RQ), making it capable to learn multiple layers of codebooks with many stages. Moreover, the introduced framework effectively deals with high dimensions making it feasible to go beyond patch level processing and deals with entire images. The proposed regularization makes use of the problem of optimal rate allocation for asymptotic case of Gaussian independent sources, which is reviewed next.

II Background: Quantization of independent Gaussian sources

Given $n$ independent Gaussian sources $X_{j}$ ’s each with variance $\sigma_{j}^{2}$ distributed as $X_{j}\!\!\sim\!\!\mathcal{N}(0,\sigma_{j}^{2})$ , the optimal rate allocation from the rate-distortion theory is derived for this setup as (Ch. 10 of [2]):

[TABLE]

where $\gamma$ should be chosen to guarantee that $\sum_{j=1}^{n}D_{j}=D$ . Hence, the optimal codeword variance $\sigma_{C_{j}}^{2}$ is soft-thresholding of $\sigma_{j}^{2}$ :

[TABLE]

This means that sources with variances less than $\gamma$ should not be assigned any rate at all. We next incorporate this phenomenon for codebook learning and enforce it as a regularization for the codebook variances. This, in fact, will be an effective way to reduce the gap between the train and test distortion errors. Moreover, the inactivity of the dimensions with variances less than $\gamma$ will also lead to a natural sparsity of codewords, which lowers the computational complexity.

III The proposed approach: RRQ

Instead of the standard K-means used in RQ, we first propose its regularized version and then use it as the building-block for RRQ.

III-A VR-Kmeans algorithm

After de-correlating the data points, e.g., using the pre-processing proposed in Fig. 2, and gathering them in in columns of $\mathrm{X}$ with $\sigma_{j}^{2}$ at each dimension, define $\mathrm{S}\triangleq\text{diag}([\sigma_{C_{1}}^{2},\cdots,\sigma_{C_{n}}^{2}])$ from Eq. 2. For codebook $\mathrm{C}$ , to regularize only the diagonal elements of $\mathrm{C}\mathrm{C}^{T}$ , define $\mathrm{P}_{j}$ with all elements as zeros except at $\mathrm{P}_{(j,j)}\!\!=\!\!1$ . We formulate the variance-regularized K-means algorithm with parameter $\lambda$ as:

[TABLE]

Like the standard K-means algorithm, we iterate between fixing $\mathrm{C}$ and updating $\mathrm{A}$ , and then fixing $\mathrm{A}$ and updating $\mathrm{C}$ .

Fix $\mathrm{C}$ , update $\mathrm{A}$ : Exactly like the standard K-means.

Fix $\mathrm{A}$ , update $\mathrm{C}$ : Eq. 3 can be re-written as:

[TABLE]

$\sum_{j=1}^{n}\mathrm{P}_{j}\mathrm{C}\mathrm{C}^{T}\mathrm{P}_{j}$ , and due to its structure $\mathrm{A}\mathrm{A}^{T}\triangleq\text{diag}([a_{1},\cdots,a_{k}])$ are diagonal. Therefore Eq. 4 will reduce to minimizing independent sub-problems at each (active) dimension:

[TABLE]

where $\mathrm{Z}\triangleq\mathrm{X}\mathrm{A}^{T}=[\mathbf{z}(1),\cdots,\mathbf{z}(n)]^{T}$ . These independent problems can be solved easily using the Newton’s algorithm, for which the derivation of the required gradient and Hessian is straightforward.

III-B Regularized Residual Quantization (RRQ) algorithm

For a fixed number of centroids $K^{(l)}$ at layer $l$ and $D_{j}^{(l-1)}$ the distortion of the previous stage of quantization for each dimension, the RRQ first specifies $\gamma^{*}=\underset{\gamma}{\text{argmin}}\Big{(}|\log_{2}{K^{(l)}}-{\underset{j\in\mathcal{A}_{\gamma}}{\sum}\frac{1}{2}\log_{2}^{+}{\frac{D_{j}^{(l-1)}}{\gamma}}}|\Big{)}$ followed by calculation of an active set of dimensions $\mathcal{A}_{\gamma^{*}}^{(l)}=\{j:1\leqslant j\leqslant n|\sigma_{j}^{2}\geqslant\gamma^{*}\}$ . The algorithm then continues with quantizing the residual of stage $l-1$ with the VR-Kmeans algorithm described above, until a desired stage $L$ which can be chosen based on distortion constraints or an overall rate budget allowed.

IV Experiments

Fig. 1 and Table I compare the performance of VR-Kmeans with K-means in quantization of high-dimensional variance-decaying independent data. In fact, in many practical cases, the correlated data behaves similarly after an energy-compacting and de-correlating transform. As is seen in this figure, the VR-Kmeans regularizes the variance resulting in a reduced train-test distortion gap.

Fig. 3 demonstrates the performance of the RRQ in super-resolution of similar images. It is clear from this figure that the high-frequency content lost in down-sampling can be reconstructed from a multi-layer codebook learned from face images with full resolution.

Bibliography4

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] C. F. Barnes, S. A. Rizvi, and N. M. Nasrabadi. Advances in residual vector quantization: a review. IEEE Transactions on Image Processing , 5(2):226–262, Feb 1996.
2[2] T. Cover and J. Thomas. Elements of Information Theory 2nd Edition . Wiley-Interscience, 2 edition, 7 2006.
3[3] K. Lee, J. Ho, and D. Kriegman. Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern Anal. Mach. Intelligence , 27(5):684–698, 2005.
4[4] N. M. Nasrabadi and R. A. King. Image coding using vector quantization: a review. IEEE Transactions on Communications , 36(8):957–971, Aug 1988.