Sampling of surfaces and functions in high dimensional spaces

Qing Zou; Mathews Jacob

arXiv:1903.00965·eess.SP·March 5, 2019

Sampling of surfaces and functions in high dimensional spaces

Qing Zou, Mathews Jacob

PDF

Open Access

TL;DR

This paper presents a novel sampling framework for recovering smooth surfaces and functions in high-dimensional spaces using a nonlinear approach that leverages low-rank features in a lifted space, enabling efficient learning from limited data.

Contribution

It introduces a nonlinear sampling method based on exponential lifting and low-rank features, generalizing union of subspace models for surface and function recovery.

Findings

01

Effective surface recovery from few samples.

02

Low-rank feature properties enable efficient computation.

03

Resembles neural networks with fewer parameters.

Abstract

We introduce a sampling theoretic framework for the recovery of smooth surfaces and functions living on smooth surfaces from few samples. The proposed approach can be thought of as a nonlinear generalization of union of subspace models widely used in signal processing. This scheme relies on an exponential lifting of the original data points to feature space, where the features live on union of subspaces. The low-rank property of the features are used to recover the surfaces as well as to determine the number of measurements needed to recover the surface. The low-rank property of the features also provides an efficient approach which resembles a neural network for the local representation of multidimensional functions on the surface; the significantly reduced number of parameters make the computational structure attractive for learning inference from limited labeled training data.

Equations24

S = {x \in R^{n} : ψ (x) = 0}

S = {x \in R^{n} : ψ (x) = 0}

ψ (x) = k \in Λ \sum c_{k} exp (j 2 π k^{T} x), x \in [0, 1)^{n} .

ψ (x) = k \in Λ \sum c_{k} exp (j 2 π k^{T} x), x \in [0, 1)^{n} .

k \in Λ \sum c_{k} exp (j 2 π k^{T} x) = c^{T} ϕ_{Λ} (x) = 0.

k \in Λ \sum c_{k} exp (j 2 π k^{T} x) = c^{T} ϕ_{Λ} (x) = 0.

ϕ_{Λ} (x) = [exp (j 2 π k_{1}^{T} x) \dots exp (j 2 π k_{∣Λ∣}^{T} x)]^{T} .

ϕ_{Λ} (x) = [exp (j 2 π k_{1}^{T} x) \dots exp (j 2 π k_{∣Λ∣}^{T} x)]^{T} .

c^{T} Φ_{Λ} (X) = 0.

c^{T} Φ_{Λ} (X) = 0.

rank (Φ_{Λ} (X)) = ∣Λ∣ - 1.

rank (Φ_{Λ} (X)) = ∣Λ∣ - 1.

rank (Φ_{Λ} (X)) = ∣Γ∣ - ∣Γ : Λ∣,

rank (Φ_{Λ} (X)) = ∣Γ∣ - ∣Γ : Λ∣,

N_{i} \geq ∣ Λ_{i} ∣ - 1; i = 1, \dots, M

N_{i} \geq ∣ Λ_{i} ∣ - 1; i = 1, \dots, M

N = i = 1 \sum M N_{i} \geq ∣Λ∣ - 1.

N = i = 1 \sum M N_{i} \geq ∣Λ∣ - 1.

f (x) = k \in Γ \sum a_{k} exp (j 2 π k^{T} x) = a^{T} Φ_{Γ} (x)

f (x) = k \in Γ \sum a_{k} exp (j 2 π k^{T} x) = a^{T} Φ_{Γ} (x)

\hat{f} (x) = p^{T} κ (X),

\hat{f} (x) = p^{T} κ (X),

k (x, y) = ϕ_{Λ} (x)^{H} \cdot ϕ_{Λ} (y) .

k (x, y) = ϕ_{Λ} (x)^{H} \cdot ϕ_{Λ} (y) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Signal Denoising Methods · Advanced Numerical Analysis Techniques · Medical Image Segmentation Techniques

Full text

Sampling of surfaces and functions in high dimensional spaces

Abstract

We introduce a sampling theoretic framework for the recovery of smooth surfaces and functions living on smooth surfaces from few samples. The proposed approach can be thought of as a nonlinear generalization of union of subspace models widely used in signal processing. This scheme relies on an exponential lifting of the original data points to feature space, where the features live on union of subspaces. The low-rank property of the features are used to recover the surfaces as well as to determine the number of measurements needed to recover the surface. The low-rank property of the features also provides an efficient approach which resembles a neural network for the local representation of multidimensional functions on the surface; the significantly reduced number of parameters make the computational structure attractive for learning inference from limited labeled training data.

**Index Terms— ** machine learning; inference

1 Introduction

Machine learning algorithms often exploit the extensive structure present in natural datasets, for visualization or to learn inference. For example, manifold embedding methods model data as points on simpler objects such as smooth curves or surfaces/manifolds in high dimensional spaces for visualization [1]. In practice, the measured real world data is often scarce, corrupted by extensive noise, missing data, and other measurement errors. Thus, the recovery of noise-free data and/or learning of inference on the data from few noisy measurements are key problems in machine learning applications.

The main focus of this work is to introduce a sampling theory for (a) recovery of high dimensional surfaces, and (b) local representation of functions that live on surfaces, from few measurements. In this work, we model the surface to be the zero set of a multidimensional band-limited function. As expected, a more band-limited function will translate to a smoother surface/curve; the bandwidth of the function serves as a measure of the regularity or complexity of the curve. Under this assumption, we show that the nonlinear features of any arbitrary point on such a curve, obtained by lifting of the points using an exponential map can be annihilated by the inner product with the coefficients of the level-set function. We show that the feature matrix is low-rank, which is used to estimate the surface from few of its samples. We introduce sampling conditions that will guarantee the recovery of the surface with high probability, when the surface is irreducible or when it is the union of several irreducible surfaces.

We generalize the above results to the local representation and recovery of band-limited multidimensional functions as the interpolation of a few samples on the surface. We also introduce sampling conditions that guarantee the perfect recovery of the function on the surface. Specifically, the low-rank nature of the exponential features of the surface provides an elegant approach to locally represent the function using considerably lower number of parameters. This significant reduction in the number of free parameters offered by this local representation makes the learning of the function from finite samples tractable. We note that the computational structure of the representation is essentially a two layer kernel network. Note that the approximation is highly local; the true function and the local representation match only on the curve/surface, while they may deviate significantly on points not on the curve/surface. This behavior of the network may explain the sensitivity of practical machine learning algorithms to adversarial attacks. Specifically, the function approximation can be exact as long as the inputs are constrained to the data manifold; an adversarial attack designed to move the input away from the surface can result in unexpected function values.

This work builds upon our prior work [2, 3, 4], where we considered the recovery of planar curves from several of the samples. This work in this paper extends the above results in three important ways (i). The planar results are generalized to the high dimensional setting in this work. (ii). The worst case sampling conditions are replaced by high-probability results, which are far less conservative, and are in good agreement with experimental results. (iii). The sampling results are extended to the local representation of functions in this work.

In particular, we show that the function can be evaluated as the interpolation of the function values on admissible anchor points on the curve/surface by a Dirichlet kernel function.

2 Background

In this work, we model the surface as the zero level-set

[TABLE]

of the bandlimited function $\psi$ :

[TABLE]

The cardinality of the set $\Lambda$ , which is denoted by $|\Lambda|$ is the number of free parameters in the surface representation. Note that the complexity of $\mathcal{S}$ grows with the bandwidth of $\psi$ ; $|\Lambda|$ is hence a measure of the complexity of the surface. We denote the $\psi$ satisfying (1), whose coefficient set $\{\mathbf{c_{k}}:\mathbf{k}\in\Lambda\}$ has the minimal support , as the minimal polynomial.

We now consider an arbitrary point $\mathbf{x}$ on the surface $\mathcal{S}$ . By (1), we have $\psi(\mathbf{x})=0$ . Using the bandlimited representation in (2), we have

[TABLE]

Here, $\phi_{\Lambda}:\mathbb{R}^{n}\to\mathbb{C}^{|\Lambda|}$ is a nonlinear feature map, which lifts a point $\mathbf{x}\in[0,1)^{n}$ to a higher dimensional space:

[TABLE]

Using the one-to-one correspondence between the trigonometric polynomials in (2) and complex polynomials established by the one in [4, 5], we define the irreducibility of the trigonometric polynomials. Specifically, we say that the trigonometric polynomial $\eta(\mathbf{x})$ is irreducible if the corresponding complex polynomial $\mathcal{P}[\eta]$ is an irreducible polynomial in $\mathbb{C}[z_{1},\cdots,z_{n}]$ .

Definition 1.

A surface is termed as irreducible, if it is the zero set of an irreducible trigonometric polynomial.

Lemma 2.

The zero set of an irreducible minimal trigonometric polynomial can only have one connected component.

3 Surface recovery from samples

Assume that we have $N$ samples on the curve. We define the the feature matrix of the sampling set $\mathbf{X}=\{\mathbf{x}_{1},\cdots,\mathbf{x}_{N}\}$ as $\Phi_{\Lambda}(\mathbf{X})=[\phi_{\Lambda}(\mathbf{x}_{1})\quad\cdots\quad\phi_{\Lambda}(\mathbf{x}_{N})]$ . Since all of these points satisfy (1), we have

[TABLE]

One can use the above null space relation to recover the coefficient vector $\mathbf{c}$ from the samples $\mathbf{X}$ . Note that there is a one-to-one correspondence (up to scaling) between the coefficient set and the curve. Hence, to uniquely determine the zero level-set of $\psi(\mathbf{x})$ , we require

[TABLE]

The following result tells us when the feature matrix satisfy this rank condition, and thus guarantee the recovery of $\mathcal{S}$ .

3.1 Irreducible surfaces

We first focus on the recover case, where $\mathcal{S}$ is an irreducible surface. These results generalize the recovery of subspaces or low-rank matrices from few samples.

Theorem 3.

Let $\mathcal{S}$ be an irreducible surface, which is the zero level-set of an irreducible trigonometric polynomial $\psi(\mathbf{x})$ whose bandwidth is given by $\Lambda$ . Let $\{\mathbf{x}_{1},\cdots,\mathbf{x}_{N}\}\in\mathcal{S}$ be $N$ samples, drawn randomly in an independently fashion. Then, the feature matrix $\Phi_{\Lambda}(\mathbf{X})$ satisfies (5) with high probability, provided $N\geq|\Lambda|-1$ .

When $n=2$ , the surface reduces to a planar curve, which is the case considered in [2, 3]. Specifically, if the bandwidth $\Lambda=k_{1}\times k_{2}$ is specified by a rectangular region, the results in [2, 3] shows that it can be recovered from $(k_{1}+k_{2})^{2}$ samples. By contrast, the above recovery guarantees reduces the sampling requirement to $|\Lambda|-1=k_{1}\cdot k_{2}-1$ , which is essentially the degrees of freedom of the curve. This quite significant reduction in the number of samples is obtained by relaxing the recovery conditions from worst case to high probability. Specifically, it is possible to come up with $k_{1}\cdot k_{2}-1$ or more samples on $\mathcal{C}$ , such that the rank of the feature matrix is less than $|\Lambda|-1$ ; however, the probability for such a choice is zero, when the samples are chosen randomly. Note that the gain in sampling is considerably more significant in high dimensional setting, where the direct extension of the results in [2, 3] suggests $(\sum_{i=1}^{n}k_{i})^{n}$ samples, while the proposed approach only requires $(\prod_{i=1}^{n}k_{i})-1$ samples. For example, when $k_{i}=5;i=1,2,3$ , the worst case guarantee requires $3375$ points, while the high probability guarantee only needs $124$ samples.

Once the feature matrix $\Phi_{\Lambda}(\mathbf{X})$ is constructed with the sufficient number of points, one can uniquely identify the coefficient vector $\mathbf{c}$ satisfying (4) using eigen decomposition. Here, we assume that the bandwidth $\Lambda$ is perfectly known. The recovered curves and surfaces using sampling result is illustrated in 2-D in Fig. 1, while the demonstration in 3-D is shown in Fig. 2.

In practice, the exact bandwidth of the surface specified by $\Lambda$ is unknown. In this case, we propose to over-estimate the bandwidth as $\Gamma\supset\Lambda$ . In this case, (5) gets modified as

[TABLE]

where $\Gamma:\Lambda$ denotes the set of all possible shifts of the set $\Lambda$ within $\Gamma$ . See [2, 3] for details. We do not give a sampling condition for this case in this paper.

3.2 Union of irreducible surfaces

Theorem 3 focused on sampling of an irreducible surface. In practice, one often has composite surfaces, which is the zero level-set of $\psi=\psi_{1}\cdot\psi_{2}\cdots\psi_{M}$ , where $\psi_{i}:i=1,\cdots,M$ are irreducible polynomials. Here, the composite curve/surface is the union of irreducible curves/surfaces, specified by $\mathcal{S}=\bigcup_{i=1}^{M}\mathcal{S}_{i}$ , where $\mathcal{S}_{i}$ is the irreducible curve/surface corresponding to $\psi_{i}$ of bandwidth $\Lambda_{i}$ . The product relation in space domain translates to convolution relation in the Fourier domain, which gives $\mathbf{c}=\mathbf{c}_{1}*\mathbf{c}_{2}*\cdots*\mathbf{c}_{M}$ . The bandwidth of $\psi$ denoted by $\Lambda$ is thus related to the individual bandwidths $\Lambda_{i}$ . We now consider the recovery of $\mathcal{S}$ from its samples, which is a non-linear generalization of the results in the context of of union of subspaces.

Theorem 4.

Let $\mathcal{S}=\bigcup_{i=1}^{M}\mathcal{S}_{i}$ be a union of irreducible surfaces, where $\mathcal{S}_{i}:i=1,\cdots,M$ is the zero level-set of the irreducible bandlimited function $\psi_{i}$ of bandwidth $\Lambda_{i}$ . Assume that each of the surface $\mathcal{S}_{i}$ are randomly sampled with $N_{i}$ points, chosen independently on the zero level-set of $\psi_{i}$ . Then, the surface $\mathcal{S}$ can be uniquely recovered with high probability iff

[TABLE]

and

[TABLE]

This theorem is true for any dimensions, including the planar setting. Note that unlike the sampling results in Theorem 3, the samples cannot be randomly chosen on the curve. The number of samples on each irreducible component should be proportional to the complexity of the curve. The interesting observation is that the sampling of each curve proportional to its complexity specified by (7) is not alone sufficient for perfect recovery. For example, we consider a planar setting in Fig. 3, where the curve is the union of two curves, each of bandwidth $3\times 3$ . According to the above result, we require each component to be sampled with a minimum of $|\Lambda_{i}|-1=8$ points, while the total number of samples should be $|\Lambda|-1=24$ , which exceeds $2(|\Lambda_{i}|-1)=18$ .

4 Local representation of functions

We now consider the efficient representation of complex functions in high dimensional spaces. We note that learning such functions from measured data is a key problem in machine learning applications. The direct representation of such functions suffers from the curse of dimensionality. The large number of parameters needed for such a representation makes it difficult to learn such functions from few labeled data points.

Fortunately, natural data often lies on simpler constructs such as surfaces in high dimensional space. We now show that a bandlimited multidimensional function can be perfectly represented over a union of surfaces with a fraction of function samples. We will focus on bandlimited multidimensional functions of the form

[TABLE]

Note that the direct representation of the function requires $|\Gamma|$ coefficients, which suffers from the curse of dimensionality.

We now use (6), which suggests that the rank of the feature matrix will be at most $|\Gamma|-|\Gamma:\Lambda|$ , which is far smaller than $|\Gamma|$ , to come up with an efficient representation. Note that the rank property specified by (6) is valid only when the points are located on $\mathcal{S}$ . Similar to Theorem 3, we have that if we randomly distribute points on $\mathcal{S}$ , the feature matrix will satisfy (6) with high probability. The above results imply that the feature vector for any point on $\mathcal{S}$ can be computed as the linear combination of feature vectors $\Phi_{\Gamma}(\mathbf{x}_{i});i=1,..,P$ , where $\mathbf{x}_{i}$ are $P=|\Gamma|-|\Gamma:\Lambda|$ points on the curve. Solving for the coefficients and using the ”kernel-trick”, we obtain the following result.

Proposition 5.

Suppose $\mathcal{S}$ is an irreducible surface with bandwidth $\Lambda$ . Consider an arbitrary bandlimited function specified by $f(\mathbf{x})=\mathbf{a}^{T}\phi_{\Gamma}(\mathbf{x})$ with bandwidth $\Gamma$ such that $\Gamma\supset\Lambda$ . For any arbitrary point on $\mathcal{S}$ , $f(\mathbf{x})$ can be exactly represented as

[TABLE]

where $\mathbf{p}^{T}=\begin{bmatrix}f(\mathbf{x}_{1})&f(\mathbf{x}_{2})&\cdots&f(\mathbf{x}_{P})\end{bmatrix}\mathbf{K}^{-1}$ and $\kappa(\mathbf{X})=\begin{bmatrix}k(\mathbf{x},\mathbf{x}_{1})&\cdots&k(\mathbf{x},\mathbf{x}_{P})\end{bmatrix}^{T}$ . Here, $\mathbf{x}_{1},\cdots,\mathbf{x}_{P}$ are $P=|\Gamma|-|\Gamma:\Lambda|$ points on $\mathcal{S}$ . $k_{\Gamma}(\cdot,\cdot)$ denotes the Dirichlet kernel of bandwidth $\Gamma$ specified by

[TABLE]

$\mathbf{K}$ * is the $P\times P$ kernel matrix with entries $\mathbf{K}_{i,j}=k_{\Gamma}(\mathbf{x}_{i},\mathbf{x}_{j})$ .*

The above proposition is illustrated in Fig. 4 in the 2-D setting. Specifically, the local representation in (e) matches the original function in (b) on the curve. This local representation reduces the number of parameters in the representation from $169$ parameters to $48$ parameters. The reduction in number of free parameters will be even more significant in high dimensions.

5 Conclusion

In this paper, we considered the recovery of surfaces from few of their samples. We showed that the exponential feature maps of the data points on surfaces lie in low-dimensional subspaces. The low-rank structure of the feature matrix is used to recover the surface from few measurements. Our results show that the surface can be uniquely recovered with high probability if the curves are sampled at a rate higher than the degrees of freedom. These results also provide an efficient approach for the local representation of multidimensional functions on surfaces from few measurements.

Bibliography5

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Sam T Roweis and Lawrence K Saul, “Nonlinear dimensionality reduction by locally linear embedding,” science , vol. 290, no. 5500, pp. 2323–2326, 2000.
2[2] Sunrita Poddar and Mathews Jacob, “Recovery of point clouds on surfaces: Application to image reconstruction,” in Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on . IEEE, 2018, pp. 1272–1275.
3[3] Sunrita Poddar and Mathews Jacob, “Recovery of noisy points on bandlimited surfaces: Kernel methods re-explained,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2018, pp. 4024 – 4028.
4[4] Sunrita Poddar, Qing Zou, and Mathews Jacob, “Sampling of planar curves: Theory and fast algorithms,” ar Xiv preprint ar Xiv:1810.11575 , 2018.
5[5] Greg Ongie and Mathews Jacob, “Off-the-grid recovery of piecewise constant images from few fourier samples,” SIAM Journal on Imaging Sciences , vol. 9, no. 3, pp. 1004–1041, 2016.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Sampling of surfaces and functions in high dimensional spaces

Abstract

1 Introduction

2 Background

Definition 1**.**

Lemma 2**.**

3 Surface recovery from samples

3.1 Irreducible surfaces

Theorem 3**.**

3.2 Union of irreducible surfaces

Theorem 4**.**

4 Local representation of functions

Proposition 5**.**

5 Conclusion

Definition 1.

Lemma 2.

Theorem 3.

Theorem 4.

Proposition 5.