Comparison-limited Vector Quantization

Joseph Chataignon; Stefano Rini

arXiv:1905.05401·cs.IT·June 28, 2019

Comparison-limited Vector Quantization

Joseph Chataignon, Stefano Rini

PDF

TL;DR

This paper introduces a new vector quantization architecture constrained by the number of comparators, not output cardinality, and proposes an algorithm to optimize its configuration for minimal distortion.

Contribution

It presents the first algorithm for optimal configuration of comparator-based vector quantizers and evaluates its performance for specific source distributions.

Findings

01

Algorithm effectively minimizes distortion for given comparator constraints.

02

Performance evaluated for uniform and Gaussian sources.

03

New architecture suitable for low-cost, energy-efficient A2D conversion.

Abstract

A variation of the classic vector quantization problem is considered, in which the analog-to-digital (A2D) conversion is not constrained by the cardinality of the output but rather by the number of comparators available for quantization. More specifically, we consider the scenario in which a vector quantizer of dimension d is comprised of k comparators, each receiving a linear combination of the inputs and producing zero/one when this signal is above/below a threshold. Given a distribution of the inputs and a distortion criterion, the value of the linear combinations and thresholds are to be configured so as to minimize the distortion between the quantizer input and its reconstruction. This vector quantizer architecture naturally arises in many A2D conversion scenarios in which the quantizer's cost and energy consumption are severely restricted. For this novel vector quantizer…

Figures4

Click any figure to enlarge with its caption.

Equations22

Y_{j n} = sign (v_{j} X_{n} + t_{j}),

Y_{j n} = sign (v_{j} X_{n} + t_{j}),

Y_{n} = sign (V X_{n} + t),

Y_{n} = sign (V X_{n} + t),

f_{enc} : {- 1, + 1}^{k} \to [2^{⌊ d R ⌋}] .

f_{enc} : {- 1, + 1}^{k} \to [2^{⌊ d R ⌋}] .

f_{dec} : [2^{⌊ d R ⌋}] \to X^{d} .

f_{dec} : [2^{⌊ d R ⌋}] \to X^{d} .

ρ^{n} (X^{n}; X^{n}) : X \times X \to R^{+},

ρ^{n} (X^{n}; X^{n}) : X \times X \to R^{+},

\overline{ρ} = n \to \infty lim sup \frac{1}{n} ρ^{n} (X^{n}; X^{n}) .

\overline{ρ} = n \to \infty lim sup \frac{1}{n} ρ^{n} (X^{n}; X^{n}) .

D (d, k, R) = in f \overline{ρ},

D (d, k, R) = in f \overline{ρ},

D (R, α) = d \to \infty, k = e^{α d} lim D (d, k, R),

D (R, α) = d \to \infty, k = e^{α d} lim D (d, k, R),

\mathrm{r}(m,n)=\sum_{i=0}^{m}\mathop{\left(\!\!\!\begin{array}[]{c}{n}\\ {i}\end{array}\!\!\!\right)}\nolimits\leq 2^{n},

\mathrm{r}(m,n)=\sum_{i=0}^{m}\mathop{\left(\!\!\!\begin{array}[]{c}{n}\\ {i}\end{array}\!\!\!\right)}\nolimits\leq 2^{n},

ρ^{n} (X^{n}; X^{n}) = \frac{1}{n} i = 1 \sum n ∣ X_{i} - X_{i} ∣^{2} .

ρ^{n} (X^{n}; X^{n}) = \frac{1}{n} i = 1 \sum n ∣ X_{i} - X_{i} ∣^{2} .

D (d, k, R) = i = 1 \sum d E [∥ X_{i} - X_{i} ∥^{2}],

D (d, k, R) = i = 1 \sum d E [∥ X_{i} - X_{i} ∥^{2}],

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Comparison-limited Vector Quantization

Joseph Chataignon

Télécom Saint-Étienne

Université Jean Monnet, France

[email protected]

Stefano Rini

Department of Electrical and Computer Engineering

National Chiao Tung University, Taiwan

[email protected]

Abstract

A variation of the classic vector quantization problem is considered, in which the analog-to-digital (A2D) conversion is not constrained by the cardinality of the output but rather by the number of comparators available for quantization. More specifically, we consider the scenario in which a vector quantizer of dimension $d$ is comprised of $k$ comparators, each receiving a linear combination of the inputs and producing zero/one when this signal is above/below a threshold. Given a distribution of the inputs and a distortion criterion, the value of the linear combinations and thresholds are to be configured so as to minimize the distortion between the quantizer input and its reconstruction. This vector quantizer architecture naturally arises in many A2D conversion scenarios in which the quantizer’s cost and energy consumption are severely restricted. For this novel vector quantizer architecture, we propose an algorithm to determine the optimal configuration and provide the first performance evaluation for the case of uniform and Gaussian sources.

I Introduction

Quantization, that is transforming continuous amplitude values into discrete ones so as to minimize a prescribed distortion measure subject to a output cardinality constraint, is one of the fundamental signal processing operations. As such, quantization has been studied in a number of contexts and a vast amount of results have been derived for this problem. In this paper we consider a variation of this problem, so far neglected in the literature, that is relevant in the design of low-cost, energy-efficient quantizers. Vector quantizers are typically manufactured using op-amp comparators that obtain a linear combinations of the quantizer inputs and a bias and produce a zero/one voltage whether the comparison between these two signals is positive/negative. Generally speaking, comparators are components with high power consumption and manufacturing cost: for this reason it is reasonable to evaluate the cost of a quantizer in terms of the number of comparators it requires. Let us consider the case of a two dimensional quantizer ( $d=2$ ) in which each dimension is quantized with a rate of 1.5 bits-per-sample ( $R=1.5\ \rm bps$ ). When quantizing a generic source, the three discretization points are separated in Voronoi regions as in Fig. 1.a so that the number of comparators required is $k=3$ . More generally, since every two reconstruction points are separated by an edge of the Voronoi region, a vector quantizer require $2^{R}(2^{R}-1)/2$ comparators. This scaling of the quantizer cost is generally valid for low-rate quantizer, since in this scenario the number of neighbors of each reconstruction point is large. Given that this quantization regime is naturally associated with low cost devices, the cost scaling seems to be particularly disadvantageous. A natural question that naturally arises is whether a better scaling of the quantizer cost can be attained. To address this question, note that number of comparators $k$ equals the total number of hyperplane segments in the Voronoi regions. Accordingly, the best scaling is attained when the $k$ hyperplanes induce the largest number of partitions of the space of dimension $d$ . After some geometrical consideration, one realizes that for the case of $d=2$ and $R=1.5$ , the largest number of partition is indeed $7$ and corresponds to the configuration in Fig. 1.b. It is now apparent that there exists a large gap in the optimal quantizer design whether one considers a constraint on the number of points used in reconstruction or the number of comparators employed by the quantizer

Relevant Results

The idea of accounting for the number of comparators required for A2D conversion emerges from the work in [1]: here the authors investigate the capacity of a MIMO channel with output quantization constraint, i.e. the MIMO channel in which the channel output is processed at the receiver using a finite number of one-bit threshold quantizers. The channel model of [1] is relevant in mm-wave communications which allows for a large number of receive antennas, while the number of A2D conversion modules remains small due to limitation in the energy and costs of RF modules. Building on an idea of [2], a connection between combinatorial geometry and the MIMO channel with output quantization constraints of [1] is drawn in [3]. In particular, in [3] it is shown that each quantizer can be interpreted as an hyperplane bisecting the transmitter signal space; for this reason the largest rates are attained by the configuration allowing for the largest number of partitions induced by the set of hyperplanes. To the best of our knowledge, the problem of hardware limited quantization has so far only being considered in [4]. The difference between our approach and that of [4] is in focusing on vector quantization and considering the comparator limitations on the vector, rather than scalar, quantizer input.

Contribution

In the following, we define comparison limited vector quantization problem in its full generality as a variation of the classical vector quantization problem. We also provide a first algorithm for the quantizer design: although not optimal, this first approach investigates the combinatorial geometric aspects of the optimal quantization design problem. Numerical evaluations are provided for the case of an iid Gaussian and uniform source. The performance of the proposed quantizer is compared with the classic Max-Lloyd quantizer design [5, 6].

Paper Organization

The paper is organized as follows: Sec. II presents the vector quantization model, Sec. III introduces the proposed design algorithm. Sec. IV provides relevant numerical evaluations. Finally, Sec. V concludes the paper.

Notation

In the remainder of the paper, all logarithms are taken in base two. With $\mathbf{x}=[x_{1},\ldots,x_{N}]\subseteq{\cal X}^{n}$ we indicate a sequence of elements from ${\cal X}$ with length $N$ . The notation $\mathbf{x}_{i}^{j}$ indicates the substring $[x_{i},\ldots,x_{j}]$ of $\mathbf{x}$ . The function ${\rm sign}(\mathbf{x})$ returns a vector with values in $\{-1,+1\}$ which equals the sign of each entry of the vector $\mathbf{x}$ . Random vector are indicated as $\mathbf{U}=[U_{1}\ldots U_{L}]^{T}\in\mathbb{R}^{L}$ . The set $\{1,\ldots,N\}$ is indicated as $[N]$ .

II Comparison-limited vector quantizer model

We consider the source quantization scenario in Fig. 2: the source sequence $\{X_{i}\}_{i\in\mathbb{N}}$ with support ${\cal X}$ . The source sequence is parsed in super-symbols $\{\mathbf{X}_{n}\}_{n\in\mathbb{N}}$ of dimension $d$ with $\mathbf{X}_{n}=[X_{dn+1},\ldots,X_{d(n+1)}]$ where $d$ is referred to as the dimension of the vector quantizer. The $j^{\rm th}$ comparator obtains a linear combination of the each super-symbol $\mathbf{X}_{n}$ and produces the signal $Y_{jn}$ as

[TABLE]

for $j\in[k]$ . The value $k$ is is referred to as the resolution of the vector quantizer; $\mathbf{v}_{j}\in\mathbb{R}^{d}$ , $t_{j}\in\mathbb{R}$ are fixed and known. The outputs of the $k$ quantizers in (1) is more conveniently expressed in vector for as

[TABLE]

where $\mathbf{V}\in\mathbb{R}^{k\times d}$ is such that the $i^{\rm th}$ row corresponds to the vector $\mathbf{v}_{i}$ in in (1). Similarly, $\mathbf{t}\in\mathbb{R}^{k}$ has the $i^{\rm th}$ entry equal to $t_{i}$ and $\mathbf{Y}_{n}=[Y_{1n},\ldots,Y_{kn}]$ . The set $[\mathbf{V},\mathbf{t}]$ is referred to as the configuration of the linear combiner. The supersymbol $\mathbf{Y}_{n}$ is provided to a source encoder that produces a bit-restricted representation of the quantizers’ output as $m_{n}\in[2^{\lfloor dR\rfloor}]$ where $R$ referred to as the rate of the quantizer through the source encoding mapping

[TABLE]

The message $m_{n}$ is provided to a source decoder which produces a reconstruction of the source super-symbol $\mathbf{X}_{n}$ , $\mathbf{\widehat{X}}_{n}=[\widehat{X}_{dn+1},\ldots,\widehat{X}_{d(n+1)}]$ with $\widehat{X}_{i}\in\widehat{{\cal X}}$ , thorough the source decoding mapping

[TABLE]

The performance of the vector quantizer is evaluated through a distortion measure

[TABLE]

for $n\in\mathbb{N}$ which is assumed non-decreasing in $n$ . For a given configuration of the linear combiners $[\mathbf{V},\mathbf{t}]$ , source encoder/decoder mappings $f_{\rm enc}$ / $f_{\rm dec}$ , and given a distortion measure between input and reconstruction sequence $\rho^{n}(X^{n};\widehat{X}^{n})$ , the performance of the quantizer is evaluated as

[TABLE]

The optimal quantizer performance for the distortion $\overline{\rho}$ , dimension $d$ , resolution $k$ and rate $R$ is obtained as

[TABLE]

where the infimization is over all linear combiner configurations and source encoder/decoder mappings.

Comparison-limited distortion-rate function

Let $k=2^{\alpha}$ , then the comparison-limited distortion-rate function is defined as

[TABLE]

for $D(d,k,R)$ in (7), that is $D(\alpha,R)$ is the minimum distortion attainable as the quantizer dimension grows to infinity while the message support grows as $2^{dR}$ and the number of quantizers as $2^{\alpha d}$ . One can also show that $D(R,\alpha)$ in (8) for $\alpha\geq 2R$ correspond to the classical rate distortion function [7, Ch. 13].

Remark 1

The above vector quantizer architecture formulation is rather general: in the remainder we consider only the case in which $R=\infty$ in (7), i.e. $D(d,k,\infty)$ . This corresponds to the scenario in which the communication rate between the source encoder and the decoder in (2) unbounded.

Some combinatorial notions

In the following, we utilize some simple combinatorial concepts which we briefly introduce here.

A hyperplane arrangement ${\cal A}$ is a finite set of $n$ affine hyperplanes in $\mathbb{R}^{m}$ for some $n,m\in\mathbb{N}$ . A hyperplane arrangement ${\cal A}=\{\mathbf{x}\in\mathbb{R}^{m},\ \mathbf{a}_{i}^{T}\mathbf{x}=b_{i}\}_{i=1}^{n}$ can be expressed as ${\cal A}=\left\{\mathbf{x},\ \mathbf{A}\mathbf{x}=\mathbf{b}\right\}$ where $\mathbf{A}$ is obtained by letting each row $i$ correspond to $\mathbf{a}_{i}^{T}$ and defining $\mathbf{b}=[b_{1}\ldots b_{n}]^{T}$ . A plane arrangement is said to be in General Position (GP) if and only if every $n\times n$ sub-matrix of $\mathbf{A}$ has non zero determinant [8]. An hyperplane arrangement induces a partition of the space $\mathbb{R}^{m}$ in a number of regions.

Lemma II.1

A hyperplane arrangement of size $n$ in $\mathbb{R}^{m}$ divides $\mathbb{R}^{m}$ into at most

[TABLE]

regions. Hyperplanes in GP divide the space in $\mathrm{r}(m,n)$ regions.

We see from Lem. II.1 that the largest number of reconstruction points for $D(d,k,\infty)$ is $\mathrm{r}(d,k)$ .

III Design algorithm

In the following section, we propose an algorithm to numerically determine the optimal linear combiner configuration and source reconstruction attaining $D(d,k,\infty)$ . For simplicity we assume the case of iid sources to be reconstructed under Mean Squared Error (MSE) distortion, i.e.

[TABLE]

In this scenario, one can show that (7) simplifies as

[TABLE]

where $\mathbf{X}=[X,\ldots,X]$ and $\mathbf{\widehat{X}}=[\widehat{X}_{1},\ldots,\widehat{X}_{d}]$ for $X,\widehat{X}_{i}$ in (11) are iid distributed, so that the subscript $n$ can be dropped. 111That is, every super-symbol $\mathbf{X}_{i}$ is quantized in the same manner, regardless of $n$ .

Similarly to the classic Max-Lloyd algorithm, the optimal quantizer design for the model in Sec. II can be divided into two optimization steps to be iterated until convergence: (i) the optimization of the set of reconstruction points $\mathbf{\widehat{X}}$ for a given combiner configuration $[\mathbf{V}\ \mathbf{t}]$ and (ii) the optimization of the combiner configuration for a given set of set of reconstruction points. The optimization step (i) is rather straightforward as the reconstruction points are chosen as the centroids of the regions induced by the hyperplane arrangement $[\mathbf{V}\ \mathbf{t}]$ in $\mathbb{R}^{d}$ [9]. The optimization step (ii) is rather more involved and we propose two methods for this optimization: a global configuration update and a local one as described in Algorithm 1. In the global configuration update, all hyperplanes are randomly perturbed with a perturbation of variance decreasing with the iteration number. In the local configuration update, a hyperplane is selected at random and its position is optimally determined so as to minimize the MSE of the reconstruction points separated by such hyperplane. One of these two methods is selected at random at every iteration, with the probability to use the global configuration update exponentially decreasing over iterations.

The reasoning between local and global update is as follows: consider a lower triangular matrix $M$ of size $\mathrm{r}(d,k)\times\mathrm{r}(d,k)$ and let the element in position $i\times j$ in $M$ equal to the index of an hyperplane separating $\widehat{X}_{i}$ and $\widehat{X}_{j}$ if such reconstruction point exists and zero otherwise. 222Note that some hyperplane arrangements induce less that $\mathrm{r}(d,k)$ regions: we assume that there exists a natural numbering of the possible $\mathrm{r}(d,k)$ regions. The matrix $M$ can be though of as one among a finite number of ways in which hyperplanes separate the reconstruction points. In this view, the local update maximizes the quantizer performance in a given value of $M$ . The global update, instead, allows “hyperplanes to jump over centroids”, resulting in a different matrix $M$ .

A crucial step in step (ii) of the design algorithm is the evaluation of the MSE for a given hyperplane configuration and reconstruction points. Given numerical precision limitations, the MSE evaluation has to be approximated using numerical integration methods and particle filters as in Algorithm 2. More specifically, random points are generated and assigned to the corresponding reconstruction point until a minimum number of points per region is attained.

Note that an approach similar to that of Algorithm 2 is used is step (i) of the algorithm when estimating the centroid of each region induced by the hyperplane arrangement, as this evaluation also require numerical integration.

By alternating the optimization step (i) and step (ii) through the MSE approximation approach in Algorithm 2, the algorithm converges to a numerical solution. Upon multiple random restarts of the algorithm, convergence to multiple local minima is sometimes observed. These minimal values arises either from a limitation in the precision of the numerical integration or by a local minimum in the quantizer configuration.

An example of convergence to multiple local minima upon multiple random initializations is shown in Fig. 3: While the arrangement in Fig. 3.a has 6 centroids, the one in Fig 3.b has 7 but both attain similar performance in the case of a standard Gaussian distribution. From a high level perspective, though, one observes that the two configurations are rather distant and the proposed algorithm is not able to converge to the better solution in Fig 3.b starting from the configuration in Fig 3.a.

Remark 2

We conjecture that the number of possible matrices $M$ that describe how reconstruction points are obtained from the hyperplane configuration grows only polynomially with $d$ and $k$ . If this assumption were true, it would be computationally feasible to consider all possible matrices $M$ in order to avoid local minima. Unfortunately, we are currently unable to prove such conjecture.

IV Simulation results

In this section we present the quantization performance for the quantizer in Sec. II for a configuration obtained using the algorithm in Sec. III for the case of (i) standard Gaussian and (ii) unitary uniform distribution. In both instances, the performance is compared to that attainable using a classic quantizer designed using the classic Max-Lloyd algorithm with the same number of reconstruction points.

IV-A Gaussian distribution

As one could expect, the quantizer obtained performs slightly worse than an optimal quantizer obtained by Max-Lloyd’s algorithm with the same number of reconstruction points. However, when comparing the number of hyperplanes used instead of the number of reconstruction points, it performs better than Max-Lloyd’s quantizer, as we can see on Fig. 4. The ratio of this algorithm’s quantizer’s MSE over Max-Lloyd’s quantizer’s is 0.64 for a configuration with 5 hyperplanes.

A result that may be surprising is that the hyperplanes do not necessarily form the maximum number of regions. Interestingly, they often arrange to make less regions with similar probabilities whether the starting configuration has many regions or not, which turns out to perform relatively well and sometimes better than configurations with more regions.

IV-B Uniform distribution

The results obtained with a uniform distribution are similar to the ones obtained with a Gaussian distribution. Again, the quantizer obtained by the algorithm performs worse than Max-Lloyd’s quantizer with the same number of reconstruction points, but better than Max-Lloyd’s quantizer with the same number of hyperplanes. This is shown in Fig. 4. The ratio of the MSE of the quantizer obtained over Max-Lloyd’s quantizer’s is 0.72 for a configuration with 5 hyperplanes.

Because of the square shape of the distribution support, the hyperplanes are even more likely to form rectangular regions than with the Gaussian distribution, as Fig. 5 illustrates.

V Conclusion

In this paper, a novel paradigm for vector quantization is considered. In this paradigm, the performance of the quantizer is not limited by the number of reconstruction point as in the classic Max-Lloyd quantizer but rather by the number of comparisons necessary to determine the quantizer output. In particular, we consider the case in which a vector quantizer is comprised $k$ comparators with receive a linear combination of the quantizer input plus a constant and output the sign of received signal. Given a distribution of the quantizer input, $k$ and a distortion measure between source and reconstruction, we consider the problem of optimally determining the linear combination and constant coefficient so that the distortion between source and reconstruction is minimized. We propose a first algorithm for this optimization problem and apply such algorithm to the case of mean squared error distortion and Gaussian and uniform iid sources. In both cases, the performance is compared to that of the Max-Lloyd quantizer.

A number of research directions remain open from this new vector quantizer architecture. In particular, we are investigating the optimal performance attainable in the limit of infinitely long vector quantizer in which the number of available comparators $k$ and bits available to represent the quantizer inputs both grow to infinity at a given constant ratio $\alpha$ . This limit should result in a rather interesting generalization of the classic distortion-rate function.

Bibliography9

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. Rini, L. Barletta, Y. C. Eldar, and E. Erkip, “A general framework for mimo receivers with low-resolution quantization,” in 2017 IEEE Information Theory Workshop (ITW) . IEEE, 2017, pp. 599–603.
2[2] J. Mo and R. W. Heath, “High snr capacity of millimeter wave mimo systems with one-bit quantization,” in 2014 Information Theory and Applications Workshop (ITA) . IEEE, 2014, pp. 1–5.
3[3] A. Khalili, S. Rini, L. Barletta, E. Erkip, and Y. C. Eldar, “On mimo channel capacity with output quantization constraints,” in 2018 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2018, pp. 1355–1359.
4[4] N. Shlezinger, Y. C. Eldar, and M. R. Rodrigues, “Hardware-limited task-based quantization,” ar Xiv preprint ar Xiv:1807.08305 , 2018.
5[5] S. Max, “Quantizing for minimum distortion,” IEEE transactions on information theory , vol. 6, no. 1, pp. 7–12, 1960.
6[6] S. Lloyd, “Least squares quantization in pcm,” IEEE transactions on information theory , vol. 28, no. 2, pp. 129–137, 1982.
7[7] T. M. Cover and J. A. Thomas, Elements of information theory . John Wiley & Sons, 2012.
8[8] T. Cover, “Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition,” IEEE Trans. Electron. Comput. , vol. EC-14, no.3, pp. 326–334, Jun. 1965.