CURE: Curvature Regularization For Missing Data Recovery

Bin Dong; Haocheng Ju; Yiping Lu; Zuoqiang Shi

arXiv:1901.09548·cs.CV·December 16, 2019

CURE: Curvature Regularization For Missing Data Recovery

Bin Dong, Haocheng Ju, Yiping Lu, Zuoqiang Shi

PDF

Open Access

TL;DR

This paper introduces CURE, a novel regularization combining low-dimensional manifold constraints with curvature smoothness, improving missing data recovery in imaging tasks.

Contribution

The paper proposes CURE, a new regularization method that integrates manifold low dimension and curvature smoothness, enhancing image inpainting and semi-supervised learning.

Findings

01

CURE outperforms LDMM in image inpainting.

02

WeCURE improves semi-supervised learning results.

03

Numerical experiments validate the effectiveness of the proposed methods.

Abstract

Missing data recovery is an important and yet challenging problem in imaging and data science. Successful models often adopt certain carefully chosen regularization. Recently, the low dimension manifold model (LDMM) was introduced by S.Osher et al. and shown effective in image inpainting. They observed that enforcing low dimensionality on image patch manifold serves as a good image regularizer. In this paper, we observe that having only the low dimension manifold regularization is not enough sometimes, and we need smoothness as well. For that, we introduce a new regularization by combining the low dimension manifold regularization with a higher order Curvature Regularization, and we call this new regularization CURE for short. The key step of solving CURE is to solve a biharmonic equation on a manifold. We further introduce a weighted version of CURE, called WeCURE, in a similar manner…

Tables4

Table 1. Table 1: Classification accuracy in percentage for MNIST. The best results are in red and the second best results are in blue .

Method	50/70000	100/70000	700/70000
WNLL[42]	73.60	87.84	93.25
WNTV[30]	78.35	89.86	94.08
CURE	88.40	92.42	96.13
WeCURE	90.48	93.49	96.12

Table 2. Table 2: Classification accuracy in percentage for COIL20 and ISOLET. The best results are in red and the second best results are in blue .

Method	COIL20			ISOLET
Method	2 $%$	5 $%$	10 $%$	2 $%$	5 $%$	10 $%$
GL	55.61	68.50	76.11	31.19	45.51	66.27
WNLL[42]	59.59	74.13	80.65	49.12	61.90	73.05
CURE	59.73	74.77	80.91	49.14	61.94	73.23
WeCURE	63.29	77.65	84.76	52.65	64.92	76.50

Table 3. Table 3: The PSNR(dB) results of different methods on Set12 dataset with sampling rate 10 % percent 10 10\% , 15 % percent 15 15\% and 20 % percent 20 20\% . The best results are indicated in red and are highlighted in bold. The second best results are indicated in blue and are highlighted by underline.

Images	C.man	House	Peppers	Starfish	Monarch	Airplane	Parrot	Lena	Barbara	Boat	Man	Couple	Average
Sample Rate	10%
LDMM	19.9329	24.8723	20.6103	19.9285	19.3395	19.9612	19.5449	26.1005	23.3176	22.6681	23.9415	22.7225	21.9117
WNLL	21.9993	28.3325	23.3210	22.2705	22.4218	21.7954	21.6121	28.5089	26.3732	24.8116	25.8126	25.0263	24.3571
CURE	21.7095	28.3023	23.3315	22.0185	22.0650	21.4078	21.5080	28.3013	26.3031	24.6798	25.7207	24.9033	24.1876
WeCURE	21.8571	28.7967	23.7416	22.3540	22.5829	21.4335	21.7753	28.7926	26.7155	25.0060	25.7145	25.1940	24.4970
Sample Rate	15%
LDMM	21.0948	26.4075	21.6434	20.9887	20.9843	21.0712	21.3412	27.7591	25.6175	23.8791	25.1269	24.0065	23.3267
WNLL	23.3052	29.1647	25.0635	23.5147	23.7171	22.7292	22.5851	29.5856	27.7837	25.8633	26.9433	26.2245	25.5400
CURE	22.8514	29.5745	25.1007	23.4509	23.8326	22.5211	22.4579	29.6253	27.7315	25.7653	26.9278	26.1798	25.5016
WeCURE	23.0993	30.9540	25.7840	24.0722	24.2587	22.8246	22.8708	30.1331	28.5615	26.2943	27.3484	26.7266	26.0773
Sample Rate	20%
LDMM	21.9057	28.2924	22.7767	22.6264	22.4175	22.1073	21.9409	28.9160	26.8121	24.8777	26.2350	25.0044	24.4927
WNLL	23.9478	30.8222	25.8068	24.5382	24.6738	23.8359	23.2844	30.5140	28.7357	26.6614	27.7806	26.7532	26.4462
CURE	23.7846	31.4606	25.7513	24.7232	24.8360	23.7147	23.5282	30.6271	28.9715	26.6736	27.8198	26.8165	26.5589
WeCURE	24.5007	32.1789	26.6428	25.3982	25.5151	24.1406	24.0625	31.3711	29.7794	27.3033	28.3473	27.4934	27.2278

Table 4. Table 4: The SSIM results of different methods on Set12 dataset with sampling rate 10 % percent 10 10\% , 15 % percent 15 15\% and 20 % percent 20 20\% . The best results are indicated in red and are highlighted in bold. The second best results are indicated in blue and are highlighted by underline.

Images	C.man	House	Peppers	Starfish	Monarch	Airplane	Parrot	Lena	Barbara	Boat	Man	Couple	Average
Sample Rate	10%
LDMM	0.2677	0.3406	0.4406	0.3856	0.4870	0.3338	0.4560	0.4508	0.4881	0.3121	0.3469	0.3389	0.3874
WNLL	0.3557	0.4236	0.5681	0.5415	0.6523	0.4352	0.5680	0.5316	0.6308	0.4383	0.4787	0.5123	0.5113
CURE	0.3591	0.4337	0.5849	0.5382	0.6537	0.4324	0.5733	0.5356	0.6392	0.4409	0.4817	0.5240	0.5164
WeCURE	0.3726	0.4397	0.6042	0.5721	0.6842	0.4448	0.5953	0.5402	0.6572	0.4628	0.5051	0.5476	0.5355
Sample Rate	15%
LDMM	0.3622	0.4288	0.5308	0.4848	0.5986	0.4252	0.5464	0.5382	0.6164	0.4187	0.4483	0.4619	0.4884
WNLL	0.4456	0.5053	0.6380	0.6196	0.7076	0.5052	0.6247	0.5931	0.6964	0.5130	0.5544	0.5911	0.5828
CURE	0.4464	0.5294	0.6610	0.6294	0.7299	0.5115	0.6435	0.5994	0.7068	0.5226	0.5637	0.6067	0.5959
WeCURE	0.4577	0.5459	0.6766	0.6658	0.7473	0.5273	0.6621	0.6102	0.7275	0.5462	0.5939	0.6308	0.6159
Sample Rate	20%
LDMM	0.4385	0.5148	0.5980	0.5783	0.6692	0.5003	0.6074	0.5997	0.6840	0.5003	0.5295	0.5501	0.5642
WNLL	0.4970	0.5735	0.6856	0.6691	0.7439	0.5684	0.6673	0.6376	0.7373	0.5722	0.6062	0.6364	0.6329
CURE	0.5063	0.6044	0.7051	0.6889	0.7687	0.5847	0.6850	0.6457	0.7515	0.5882	0.6203	0.6571	0.6505
WeCURE	0.5270	0.6167	0.7241	0.7214	0.7859	0.6009	0.7017	0.6570	0.7683	0.6093	0.6492	0.6806	0.6702

Equations247

LDMM (u) = \frac{1}{2} ∥ \nabla_{M} u ∥_{L^{2} (M)}^{2} .

LDMM (u) = \frac{1}{2} ∥ \nabla_{M} u ∥_{L^{2} (M)}^{2} .

d im (M) (x) = j = 1 \sum d ∣ \nabla_{M} α_{i} (x) ∣^{2}

d im (M) (x) = j = 1 \sum d ∣ \nabla_{M} α_{i} (x) ∣^{2}

\text{CURE}(u)=\text{LDMM}(u)+\frac{\lambda}{2}\int_{\mathcal{M}}{\color[rgb]{0,0,1}(\Delta_{\mathcal{M}}u)^{2}},

\text{CURE}(u)=\text{LDMM}(u)+\frac{\lambda}{2}\int_{\mathcal{M}}{\color[rgb]{0,0,1}(\Delta_{\mathcal{M}}u)^{2}},

x = ψ (α) : U \subset R^{k} \to M \subset R^{d}

x = ψ (α) : U \subset R^{k} \to M \subset R^{d}

u_{i} (x) = x_{i}, 1 \leq i \leq d .

u_{i} (x) = x_{i}, 1 \leq i \leq d .

\text{CURE}(u)=\text{LDMM}(u)+\frac{\lambda}{2}\int_{\mathcal{M}}{\color[rgb]{0,0,1}(\Delta_{\mathcal{M}}u)^{2}},

\text{CURE}(u)=\text{LDMM}(u)+\frac{\lambda}{2}\int_{\mathcal{M}}{\color[rgb]{0,0,1}(\Delta_{\mathcal{M}}u)^{2}},

\nabla_{M} u (x, y) \approx ω (x, y) (u (y) - u (x)) =: \nabla_{P} u (x, y), x, y \in P \subset M,

\nabla_{M} u (x, y) \approx ω (x, y) (u (y) - u (x)) =: \nabla_{P} u (x, y), x, y \in P \subset M,

LDMM (u) \approx \frac{1}{2} x, y \in P \sum w (x, y) (u (x) - u (y))^{2} = ∥ \nabla_{P} u ∥_{2}^{2} .

LDMM (u) \approx \frac{1}{2} x, y \in P \sum w (x, y) (u (x) - u (y))^{2} = ∥ \nabla_{P} u ∥_{2}^{2} .

- \partial_{u} (∥ \nabla_{P} u ∥_{2}^{2}) = y \in P \sum w (x, y) (u (x) - u (y)),

- \partial_{u} (∥ \nabla_{P} u ∥_{2}^{2}) = y \in P \sum w (x, y) (u (x) - u (y)),

G Lu (x) := y \in P \sum w (x, y) (u (x) - u (y)) .

G Lu (x) := y \in P \sum w (x, y) (u (x) - u (y)) .

u min ∥ \nabla_{P} u ∥_{2}^{2} + \frac{λ}{2} ∥ G Lu ∥_{2}^{2} .

u min ∥ \nabla_{P} u ∥_{2}^{2} + \frac{λ}{2} ∥ G Lu ∥_{2}^{2} .

WNLL (u) = ∥ (\nabla_{P} u)_{∣ P \ S} ∥_{2}^{2} + \frac{∣ P ∣}{∣ S ∣} ∥ (\nabla_{P} u)_{∣ S} ∥_{2}^{2},

WNLL (u) = ∥ (\nabla_{P} u)_{∣ P \ S} ∥_{2}^{2} + \frac{∣ P ∣}{∣ S ∣} ∥ (\nabla_{P} u)_{∣ S} ∥_{2}^{2},

∥ (\nabla_{P} u)_{∣ S} ∥_{2}^{2} := x \in S, y \in P \sum \frac{1}{2} w (x, y) (u (x) - u (y))^{2},

∥ (\nabla_{P} u)_{∣ S} ∥_{2}^{2} := x \in S, y \in P \sum \frac{1}{2} w (x, y) (u (x) - u (y))^{2},

u min WeCURE (u) := WNLL (u) + λ [∥ (G Lu)_{∣ P \ S} ∥_{2}^{2} + \frac{∣ P ∣}{∣ S ∣} ∥ (G Lu)_{∣ S} ∥_{2}^{2}],

u min WeCURE (u) := WNLL (u) + λ [∥ (G Lu)_{∣ P \ S} ∥_{2}^{2} + \frac{∣ P ∣}{∣ S ∣} ∥ (G Lu)_{∣ S} ∥_{2}^{2}],

∥ (G Lu)_{∣ S} ∥_{2}^{2} = x \in S \sum y \in P \sum w (x, y) (u (x) - u (y))^{2}

∥ (G Lu)_{∣ S} ∥_{2}^{2} = x \in S \sum y \in P \sum w (x, y) (u (x) - u (y))^{2}

\min_{u|_{P\backslash S}}\text{WNLL}\left(\left[\begin{array}[]{cc}u|_{P\backslash S}\\ g\end{array}\right]\right)+\lambda\left\|\sqrt{D}\cdot GL\left[\begin{array}[]{cc}u|_{P\backslash S}\\ g\end{array}\right]\right\|_{2}^{2},

\min_{u|_{P\backslash S}}\text{WNLL}\left(\left[\begin{array}[]{cc}u|_{P\backslash S}\\ g\end{array}\right]\right)+\lambda\left\|\sqrt{D}\cdot GL\left[\begin{array}[]{cc}u|_{P\backslash S}\\ g\end{array}\right]\right\|_{2}^{2},

\partial_{u|_{P\backslash S}}\text{WeCURE}\left(\left[\begin{array}[]{cc}u|_{P\backslash S}\\ g\end{array}\right]\right)=\partial_{u|_{P\backslash S}}\text{WNLL}\left(\left[\begin{array}[]{cc}u|_{P\backslash S}\\ g\end{array}\right]\right)+\lambda\partial_{u|_{P\backslash S}}\left\|\sqrt{D}\cdot GL\left[\begin{array}[]{cc}u|_{P\backslash S}\\ g\end{array}\right]\right\|_{2}^{2}.

\partial_{u|_{P\backslash S}}\text{WeCURE}\left(\left[\begin{array}[]{cc}u|_{P\backslash S}\\ g\end{array}\right]\right)=\partial_{u|_{P\backslash S}}\text{WNLL}\left(\left[\begin{array}[]{cc}u|_{P\backslash S}\\ g\end{array}\right]\right)+\lambda\partial_{u|_{P\backslash S}}\left\|\sqrt{D}\cdot GL\left[\begin{array}[]{cc}u|_{P\backslash S}\\ g\end{array}\right]\right\|_{2}^{2}.

\left\|\sqrt{D}\cdot GL\left[\begin{array}[]{cc}u|_{P\backslash S}\\ g\end{array}\right]\right\|_{2}^{2}=\left\|\sqrt{D}\cdot GL\left[\begin{array}[]{cc}u|_{P\backslash S}\\ 0\end{array}\right]+\sqrt{D}\cdot GL\left[\begin{array}[]{cc}0\\ g\end{array}\right]\right\|_{2}^{2}.

\left\|\sqrt{D}\cdot GL\left[\begin{array}[]{cc}u|_{P\backslash S}\\ g\end{array}\right]\right\|_{2}^{2}=\left\|\sqrt{D}\cdot GL\left[\begin{array}[]{cc}u|_{P\backslash S}\\ 0\end{array}\right]+\sqrt{D}\cdot GL\left[\begin{array}[]{cc}0\\ g\end{array}\right]\right\|_{2}^{2}.

\displaystyle\partial_{u|_{P\backslash S}}\text{WeCURE}\left(\left[\begin{array}[]{cc}u|_{P\backslash S}\\ g\end{array}\right]\right)=

\displaystyle\partial_{u|_{P\backslash S}}\text{WeCURE}\left(\left[\begin{array}[]{cc}u|_{P\backslash S}\\ g\end{array}\right]\right)=

\displaystyle+\lambda GL^{T}\cdot D\cdot GL\left[\begin{array}[]{cc}u|_{P\backslash S}\\ 0\end{array}\right]+\lambda GL^{T}\cdot D\cdot GL\left[\begin{array}[]{cc}0\\ g\end{array}\right].

\begin{array}[]{ll}\left(GL\cdot\left[\begin{array}[]{cc}u|_{P\backslash S}\\ 0\end{array}\right]+\gamma\cdot DW\cdot\left[\begin{array}[]{cc}u|_{P\backslash S}\\ 0\end{array}\right]+\lambda GL^{T}\cdot D\cdot GL\cdot\left[\begin{array}[]{cc}u|_{P\backslash S}\\ 0\end{array}\right]\right)({\bm{x}})&\\ \hskip 21.68121pt=\sum_{\bm{y}\in S}w(\bm{x},\bm{y})g(\bm{y})+\gamma\sum_{\bm{y}\in S}w(\bm{y},\bm{x})g(\bm{y})-\lambda\left(GL^{T}\cdot D\cdot GL\left[\begin{array}[]{cc}0\\ g\end{array}\right]\right)({\bm{x}}),&\bm{x}\in P\backslash S,\end{array}

\begin{array}[]{ll}\left(GL\cdot\left[\begin{array}[]{cc}u|_{P\backslash S}\\ 0\end{array}\right]+\gamma\cdot DW\cdot\left[\begin{array}[]{cc}u|_{P\backslash S}\\ 0\end{array}\right]+\lambda GL^{T}\cdot D\cdot GL\cdot\left[\begin{array}[]{cc}u|_{P\backslash S}\\ 0\end{array}\right]\right)({\bm{x}})&\\ \hskip 21.68121pt=\sum_{\bm{y}\in S}w(\bm{x},\bm{y})g(\bm{y})+\gamma\sum_{\bm{y}\in S}w(\bm{y},\bm{x})g(\bm{y})-\lambda\left(GL^{T}\cdot D\cdot GL\left[\begin{array}[]{cc}0\\ g\end{array}\right]\right)({\bm{x}}),&\bm{x}\in P\backslash S,\end{array}

S = i = 1 ⋃ l S_{i},

S = i = 1 ⋃ l S_{i},

ϕ_{i} (x) = 1, x \in S_{i}, ϕ_{i} (x) = 0, x \in S \ S_{i},

ϕ_{i} (x) = 1, x \in S_{i}, ϕ_{i} (x) = 0, x \in S \ S_{i},

L (x) = k, where k = arg 1 \leq i \leq l max ϕ_{i} (x)

L (x) = k, where k = arg 1 \leq i \leq l max ϕ_{i} (x)

w (x, y) = exp (- \frac{∥ x - y ∥ ^{2}}{σ ( x ) ^{2}}),

w (x, y) = exp (- \frac{∥ x - y ∥ ^{2}}{σ ( x ) ^{2}}),

p_{ij} (f) = {f (\tilde{i}, \tilde{j}) : i - (s_{1} - 1) /2 \leq \tilde{i} \leq i + (s_{1} - 1) /2, j - (s_{2} - 1) /2 \leq \tilde{j} \leq j + (s_{2} - 1) /2},

p_{ij} (f) = {f (\tilde{i}, \tilde{j}) : i - (s_{1} - 1) /2 \leq \tilde{i} \leq i + (s_{1} - 1) /2, j - (s_{2} - 1) /2 \leq \tilde{j} \leq j + (s_{2} - 1) /2},

P (f) = {p_{ij} (f) : (i, j) \in {1, 2, \dots, m} \times {1, 2, \times, n}} \subset R^{d}, d = s_{1} \cdot s_{2} .

P (f) = {p_{ij} (f) : (i, j) \in {1, 2, \dots, m} \times {1, 2, \times, n}} \subset R^{d}, d = s_{1} \cdot s_{2} .

u (p_{ij} (f)) = f (i, j),

u (p_{ij} (f)) = f (i, j),

u^{n + 1} (x) = f (x), x \in S^{n} .

u^{n + 1} (x) = f (x), x \in S^{n} .

(\overset{ˉ}{P} u) (x) = [(P u) (x), λ \overset{x}{ˉ}]

(\overset{ˉ}{P} u) (x) = [(P u) (x), λ \overset{x}{ˉ}]

\overset{x}{ˉ} = (\frac{x _{1} ∥ ( f ∣ _{Ω} ) ∥ _{\infty}}{m}, \frac{x _{2} ∥ ( f ∣ _{Ω} ) ∥ _{\infty}}{n}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical methods in inverse problems · Image and Signal Denoising Methods · Sparse and Compressive Sensing Techniques

Full text

\newsiamremark

remarkRemark \newsiamremarkhypothesisHypothesis

\newsiamthmclaimClaim \headersCURE: Curvature Regularization For Missing Data RecoveryBin Dong, Haocheng Ju, Yiping Lu, and Zuoqiang Shi

CURE: Curvature Regularization For Missing Data Recovery

XXX XXX (, XXX). XXX

XXX XXX (, ). XXX

XXX

XXX33footnotemark: 3

Bin Dong Beijing International Center for Mathematical Research, Peking University, Beijing, 100871 China.() [email protected]

Haocheng Ju School Of Mathematical Science, Peking University, Beijing, 100871 China.() [email protected]

Yiping Lu Institute for Computational and Mathematical Engineering (ICME), Stanford University, Stanford, CA, 94305.() [email protected]

Zuoqiang Shi() Department of Mathematical Sciences, Yau Mathematical Sciences Center, Tsinghua University, Beijing, 100084 China. [email protected]

Abstract

Missing data recovery is an important and yet challenging problem in imaging and data science. Successful models often adopt certain carefully chosen regularization. Recently, the low dimensional manifold model (LDMM) was introduced by [36] and shown effective in image inpainting. The authors of [36] observed that enforcing low dimensionality on image patch manifold serves as a good image regularizer. In this paper, we observe that having only the low dimensional manifold regularization is not enough sometimes, and we need smoothness as well. For that, we introduce a new regularization by combining the low dimensional manifold regularization with a higher order CUrvature REgularization, and we call this new regularization CURE for short. The key step of CURE is to solve a biharmonic equation on a manifold. We further introduce a weighted version of CURE, called WeCURE, in a similar manner as the weighted nonlocal Laplacian (WNLL) method [42]. Numerical experiments for image inpainting and semi-supervised learning show that the proposed CURE and WeCURE significantly outperform LDMM and WNLL respectively.

keywords:

Graph Laplacian, Nonlocal Methods, Point Cloud, Biharmonic Equation, Interpolation, Image Inpainting.

{AMS}

62H35 65D18 68U10 58C40 58J50

1 Introduction

Missing data recovery is a fundamental problem in imaging science and data analysis. In many cases, it can be formulated as a function interpolation problem in multiple dimension spaces. Let $u:\mathbb{R}^{d}\to\mathbb{R}$ be an unknown function. We would like to acquire its values on a set of points $P=\{\bm{p}_{1},\ldots,\bm{p}_{n}\}\subset\mathbb{R}^{d}$ . However, due to practical limitations, we are only able to observe its values on a subset $S=\{\bm{s}_{1},\ldots,\bm{s}_{m}\}\subset P$ . The goal of missing data recovery is to reconstruct the missing values of $u$ based on the observed values in $S$ . In this paper, we focus on two kinds of typical and important tasks of missing data recovery, i.e. semi-supervised learning and image inpainting, though it can be well applied to other related tasks as well.

Since the problem of missing data recovery is an under-determined inverse problem, we can only hope to recover the missing values of $u$ if we have certain prior knowledge on $u$ , e.g. $u$ belonging to a certain function class or having certain mathematical or statistical properties. Successful models include Rudin–Osher–Fatemi(ROF) model [39] and its variants [26, 4, 13], the applied harmonic analysis models such as wavelets [44, 18], curvelet [43], shearlet [22, 32] and wavelet frame [2, 9, 12, 10, 47, 20], the Bayesian statistics based methods [38, 40, 48]; and the list goes on.

More recently, people started to use low dimensional manifolds to describe the underlying relationship between the data points which serves as an effective geometric prior on the interpolant. For example, [36, 37] observed that image patches, regarded as data points in a high dimension space, often lie on a low dimensional manifold; and [15, 49] allowed the data lie close to (but may not be on) a certain low dimensional manifold.

To harvest the low dimensional property of data, [36] applied the following Dirichlet energy [50] to regularize the dimension of the embedded manifold $\mathcal{M}$

[TABLE]

In [36], the authors gave a geometric interpretation of the Dirichlet regularizer. They showed that the dimension of a smooth manifold embedded in $\mathbb{R}^{d}$ can be calculated by a simple formula

[TABLE]

where $\alpha_{i}$ is the coordinate function, for any $\bm{x}=(x_{1},\cdots,x_{d})\in\mathcal{M}\subset\mathbb{R}^{d}$ , $\alpha_{i}(\bm{x})=x_{i}$

This means that we can minimize the Dirichlet energy to enforce a penalty on the (local) dimensions of the underlying manifold. As a result, the authors referred to their method as the low dimensional manifold model (LDMM). To recover missing data, they proposed to minimize the Dirichlet energy subject to the constraints $u(\bm{s})=g(\bm{s})$ , $\forall\mathbf{s}\in S$ , where $g:S\to\mathbb{R}$ denotes the observed part of the underlying function $u$ .

1.1 Higher Order Regularization

Only low dimension structure of the manifold does not readily ensure smoothness of the reconstructed manifold which may lead to unsatisfactory results [34, 23, 11]. As a simple demonstration, we show in Figure 1 a degenerated interpolation result from the two data points labeled in red. Although the interpolated surface is also a low dimensional manifold, it is certainly not a smooth interpolation.

In this paper, we look for the proper interpolation by not only assuming low dimensionality of the manifold, but also the smoothness. For that, in addition to the Dirichlet energy, we further introduce a CUrvature REgularization (CURE) term via biharmonic operator. The proposed CURE energy reads as follows

[TABLE]

where LDMM is given by (1). Note that regularizing the curvature by introducing higher order energy term has already been proposed in image processing [41]. However, to the best of our knowledge, we are the first to promote curvature-like regularization for nonlocal image processing. Furthermore, inspired by the weighted nonlocal Laplacian (WNLL) method proposed by [42] which can preserve the symmetry of the Laplace operator, we propose a weighted CURE (WeCURE) model which can significantly improve the results over the CURE model. To demonstrate the effectiveness of CURE and WeCURE, we test our model on semi-supervised learning and image inpainting task. Numerical results show that CURE/WeCURE produces significantly better results than LDMM/WNLL in both tasks. A glimpse of the results for image inpainting is shown in Figure 2 where we can see the significant improvement of CURE over LDMM and WeCURE over WNLL. More details and numerical results can be found in Section 3 and Section 4.

1.2 Other Related Works

Nonlocal patch-based image restoration methods[16, 17, 7, 6, 26] have achieved great success in the literature. In addition, [24, 3, 19] also introduced different graph Laplacian-based regularization on manifold and graphs. Our method, however, focuses on both smoothness and low dimensionality of the underlying data manifold. The most similar work to ours is [1], where the authors also introduced a higher order regularization for semi-supervised learning. The difference is threefold. First, we extend the task to image inpainting rather than just semi-supervised learning. Secondly, we introduce a curvature perspective on the higher order regularization. Last but not least, the newly proposed weighted version of CURE, i.e. WeCURE, has significant performance boost in both image inpainting and semi-supervised learning.

Another approach to regularize the dimension of the manifold is through low-rank matrix completion [27, 28]. The basic idea is to group the patches by similarity and penalized the rank/nuclear norm of the matrix obtained by reshaping the stack of the similar patches. The work in this paper reveals a benefit of PDE-based approaches that higher order information, such as curvature, can be naturally incorporated in the model.

1.3 Organization of the Paper

The paper is organized as follows. The proposed CURE and WeCURE model are introduced in Section 2, Numerical comparisons of CURE and WeCURE with LDMM and WNLL for semi-supervised learning and image inpainting are presented in Section 3 and Section 4 respectively. The general setting of the asymptotic analysis of the proposed model is presented in Section 5 and the complete proof is given in Sections A.4 and A.5. Conclusions and summary are given in Section 6.

2 Curvature Regularization (CURE): Model and General Algorithm

In this section, we first propose the CURE model and a weighted version of CURE. Then, we will discuss how (We)CURE can be applied to missing data recovery in general.

2.1 CURE

Let $\mathcal{M}$ be a smooth manifold embedded in $\mathbb{R}^{d}$ and locally parameterized as

[TABLE]

where $k=dim_{x}(\mathcal{M})$ is the local dimension of $\mathcal{M}$ at $x$ , $\alpha=(\alpha^{1},\ldots,\alpha^{k})^{\top}\in\mathbb{R}^{k}$ and $x=(x_{1},\ldots,x_{d})^{\top}\in\mathcal{M}$ . Let $\bm{u}=(u_{1},u_{2},\cdots,u_{d})$ be the coordinate function on $\mathcal{M}$ , i.e. for $x\in\mathcal{M}$

[TABLE]

To enforce smoothness of the underlying manifold, we further regularize the curvature of the manifold. Recall that the mean curvature of a manifold $\mathcal{M}$ is defined as the trace of the second fundamental form [33], i.e. $H\vec{n}=g^{i,j}\nabla_{i}\nabla_{j}X$ . Here $g^{i,j}$ is the metric tensor defined by $g_{i^{\prime}j^{\prime}}=\left<\partial_{i^{\prime}},\partial_{j^{\prime}}\right>=\sum_{l=1}^{k}\partial_{i^{\prime}}\psi^{l}\partial_{j^{\prime}}\psi^{l}$ . If the coordinate function $\bm{u}(x)$ is an isometric immersion, the mean curvature can be calculated as $\|\Delta\bm{u}\|_{2}/k$ , where $\Delta\bm{u}=(\Delta u_{1},\Delta u_{2},\cdots,\Delta u_{d})$ (see [33] for detail).

Now, we are ready to introduce the CURE energy in continuum setting:

[TABLE]

where $\text{LDMM}(u)$ is given by (1). The gradient $\nabla_{\mathcal{M}}u$ is commonly approximated by the nonlocal gradient in the discrete setting

[TABLE]

where $P$ is a set with $n$ points on the manifold $\mathcal{M}$ . Then,

[TABLE]

Here, $w(\bm{x},\bm{y})$ is a given symmetric weight function which is often chosen to be a Gaussian weight $w(\bm{x},\bm{y})$ =exp $(-\frac{\left\|\bm{x}-\bm{y}\right\|^{2}}{\sigma^{2}})$ , where $\sigma$ is a parameter and $\left\|\cdot\right\|$ denotes the Euclidean norm in $\mathbb{R}^{\frac{n(n-1)}{2}}$ . The negative of the first variation of $\|\nabla_{P}u\|_{2}^{2}$ takes the form

[TABLE]

which is the nonlocal Laplacian that has been used in image processing [5, 6, 24, 26]. It is also called graph Laplacian in spectral graph and machine learning literature [14, 50]. To simplify the notation, we use $GL$ to denote the graph Laplacian [31, 45, 46]:

[TABLE]

Now, the proposed CURE model can be cast as the following optimization problem in the discrete setting

[TABLE]

In [42], a weighted nonlocal Laplacian (WNLL) method was introduced to balance the loss at both labeled and unlabeled points and to preserve the symmetry of the Laplace operator at the same time. Let $S\subset P$ be a set with labeled points. The WNLL model in the discrete setting is given by

[TABLE]

where

[TABLE]

and similarly for $\|\left(\nabla_{P}u\right)_{|P\backslash S}\|_{2}^{2}$ .

Following a similar idea as that in WNLL, we propose the weighted CURE model (WeCURE) in the discrete setting

[TABLE]

where

[TABLE]

and similarly for $\|\left(GLu\right)_{|P\backslash S}\|_{2}^{2}$ .

2.2 CURE for Missing Data Recovery

For missing data recovery, we can simply minimize the CURE or WeCURE energy with respect to the constraints $u(\bm{x})=g(\bm{x}),\bm{x}\in S$ where $g$ is the observed values of the underlying function to be recovered. We discuss implementation details for WeCURE. CURE is a special case of WeCURE with all weights equal to 1.

Recall the definition of the energy function of WeCURE (3) and notice that $u(\bm{x})=g(\bm{x}),\bm{x}\in S$ . Then, WeCURE model for missing data recovery can be rewritten as

[TABLE]

where $D=\operatorname{diag}\{d_{1},d_{2},\ldots,d_{|P|}\}$ with $d_{i}=1$ for $\bm{x}_{i}\in P\backslash S$ and $d_{i}=\frac{|P|}{|S|}$ for $\bm{x}_{i}\in S$ , and $GL$ is the $|P|\times|P|$ matrix of graph Laplacian. The first variation of (4) is

[TABLE]

Note that

[TABLE]

Thus

[TABLE]

Then, the solution to problem (4) can be given by solving the following Euler-Lagrange equation

[TABLE]

where $DW=\text{diag}(w_{1},w_{2},\ldots,w_{|P|})$ with $w_{i}=\sum_{\bm{y}\in S}w(\bm{x}_{i},\bm{y})$ and $\gamma$ is the weighted coefficient in WNLL. The above linear system is symmetric positive definite and sparse which can be solved efficiently by iterative solvers such as the conjugate gradient method. We remark that, for (non-weighted) CURE method, we only need to replace matrix $D$ above by identity matrix $Id_{|P|\times|P|}$ . We summarize (We)CURE algorithm for missing data recovery in Algorithm 1.

3 CURE for Semi-Supervised Learning

Semi-supervised learning is a challenging and yet frequently encountered machine learning task. It can be formulated as a missing data recovery problem [50]. Given a data set $P=\{\bm{p}_{1},\ldots,\bm{p}_{n}\}\subset\mathbb{R}^{d}$ , we assume there are totally $l$ different classes. Let $S\subset P$ be a subset of $P$ with labels, i.e

[TABLE]

where $S_{i}\subset P$ is the subset with label $i$ . It is typical for semi-supervised learning that $|S|$ is far less than $|P|$ . The objective of semi-supervised learning is to extend labels to the entire data set $P$ . Our algorithm is summarized in Algorithm 2.

We test WNLL, Weighted Nonlocal Total Variation (WNTV) [30], CURE, WeCURE on the MNIST dataset [29] of handwritten digits classification [8], COIL20 dataset[Nene96columbiaobject] of object classification and ISOLET dataset[21] of spoken letter recognition. Some sample images from MNIST and COIL20 are shown in Figure 3. The MNIST dataset contains 70,000 gray-scale images of size 28 $\times$ 28 with 10 classes of digits going from 0 to 9. Each class contains 7,000 images. Each image can be seen as a point in a 784-dimension Euclidean space. The COIL20 dataset contains 20 objects, and each object has 72 images. The size of each image is 32 $\times$ 32 pixels, with 256 grey levels per pixel. Thus, each image is represented by a 1024-dimensional vector. The ISOLET dataset contains 150 subjects who spoke the name of each letter of the alphabet twice. The speakers are grouped into sets of 30 speakers each and are referred to as isolet1 through isolet5. In our experiment, we use isolet1 which consists of 1560 samples with each sample represented by a 617-dimensional vector.

The weight function $w(\bm{x},\bm{y})$ is constructed as

[TABLE]

where $\sigma(\bm{x})$ is chosen to be the distance between $\bm{x}$ and its $k$ th nearest neighbor ( $k=20$ in MNIST, $k=15$ in COIL20 and ISOLET). To make the weight matrix sparse, the weight $w(\bm{x},\bm{y})$ is truncated to the 50 nearest neighbors.

In our test on MNIST, we choose five different sampling rates to form the training set: labeling 700, 100, 70, 50 and 35 images in the whole dataset at random. For each sampling rate, we repeat the test results 10 times. In our test on COIL20 and ISOLET, we choose three different sampling rates to form the training set: labeling $2\%$ , $5\%$ , $10\%$ at random. For each sampling rate, we repeat the test 10 times. Figure 4 shows the success rate of WNLL, CURE, and WeCURE method on MNIST dataset. The first five images of Figure 4 show the success rate for each sampling rate, while the last image shows the average success rate for each of the five sampling rate. It can be clearly observed that the proposed CURE and WeCURE outperform WNLL for all the tested cases. With a high sampling rate, WeCURE is comparable with CURE, whereas WeCURE outperforms CURE in the cases with lower sampling rates. In terms of average success rate, both CURE and WeCURE outperform WNLL. We also compare (We)CURE with WNLL and Weighted Nonlocal Total Variation (WNTV) [30] in Table 1. It can be seen that (We)CURE significantly outperforms both WNLL and WNTV in cases with lower sample rates (50/70000,100/70000). Table 2 shows the result on COIL20 and ISOLET dataset. It can be seen that WeCURE outperforms CURE and WNLL by $3\%\sim 4\%$ .

4 CURE for Image Inpainting

In this section, we apply (We)CURE to reconstruct the images with partially observed pixels. We adopt the assumption that image patches lie on a low dimensional and smooth manifold. Given an image $f\in\mathbb{R}^{m\times n}$ , for any $(i,j)\in\{1,2,\ldots,m\}\times\{1,2,\ldots,n\}$ , we define an $s_{1}\times s_{2}$ image patch as

[TABLE]

where we assume $s_{1}$ and $s_{2}$ are odd integers and we adopt reflective boundary conditions for $(i,j)$ near image boundary. Define the patch set $P(f)$ as the collection of all patches:

[TABLE]

Define a function $u$ on $P(f)$ as

[TABLE]

where $f(i,j)$ is the intensity of image $f$ at pixel $(i,j)$ .

Now, suppose we only observe the image on a subset of pixels $\Omega\subset\{(i,j):1\leq i\leq m,1\leq j\leq n\}$ . We would like to recover the entire image $f$ from the observed data $f|_{\Omega}$ . This problem can be recast as the interpolation of the function $u$ on the patch set $P(f)$ with $u$ being given in $S\subset P(f)$ , $S=\{p_{ij}(f):(i,j)\in\Omega\}$ . This falls into the general algorithmic framework of (We)CURE for missing data recovery (Algorithm 2). Notice that the patch set $P(f)$ is unknown. Thus, we need to iterative update the patch set $P(f)$ . We summarize the (We)CURE algorithm for this problem in Algorithm 3.

The weight function $w(\bm{x},\bm{y})$ is chosen as (6). Here, $x,y\in\mathbb{R}^{d+2}$ are semi-local patches and $\sigma(\bm{x})$ is chosen to be the distance between $\bm{x}$ and its 20th nearest neighbor. To make the weight matrix sparse, the weight is truncated to the 50 nearest neighbors. In the semi-local patches, the local coordinate is normalized to have the same amplitude as the image intensity,

[TABLE]

with

[TABLE]

where $x=(x_{1},x_{2})$ and $m,n$ are the size of the image. The purpose of introducing semi-local patches is to constrain the search space to a local area. The larger $\lambda$ leads to smaller search space making the searching quicker, while smaller $\lambda$ leads to global search and make more accurate results. Thus following [42] we gradually reduce $\lambda$ by $\lambda^{k+1}=\max(\lambda^{k}-1,3)$ and initialization $\lambda=10$ .

We apply our algorithm to 12 widely used test images. In our experiment, we select the patch size to be $11\times 11$ . For each patch, the nearest neighbors are obtained by using an approximate nearest neighbor (ANN) search algorithm. We use a k-d tree approach as well as an ANN search algorithm to reduce the computational cost. The linear system in weighted nonlocal Laplacian and graph Laplacian is solved by the conjugate gradient method. We use the solution of WNLL after 6 steps as the initialization of our algorithm to get a proper initial guess of the similarity relationships between different groups. The initial image of WNLL is obtained by filling the missing pixels with random numbers which satisfy a Gaussian distribution, where $\mu_{0}$ is the mean of $f|_{\Omega}$ and $\sigma_{0}$ is the standard deviation of $f|_{\Omega}$ .

Quality of the restored images is measured by PSNR and SSIM. PSNR is defined as

[TABLE]

where $f^{*}$ is the ground truth. SSIM is defined as a multiplication of three terms that quantifies similarity of luminance, contrast and structure. It takes the following form

[TABLE]

where

[TABLE]

where $\mu_{x},\mu_{y},\sigma_{x},\sigma_{x}$ and $\sigma_{xy}$ are the local means, standard deviations and cross-covariance for image $x,y$ .

The numerical results are shown in Table 3 and Table 4. For qualitative comparisons, Figure 6 shows the inpainting results of 3 images from Set12 dataset at $15\%$ sample rate. Figure 7 shows the inpainting results at $20\%$ sample rate. As we can see, WeCURE gives much better results than WNLL both visually and in terms of PSNR and SSIM. We observe that (We)CURE can well recover texture and preserve sharp image features such as edges, though it also introduces mild artifacts in smooth regions. This is why (We)CURE significantly outperforms WNLL in terms of SSIM.

5 Asymptotic Analysis

In this section, we aim to provide an asymptotic analysis of the proposed numerical scheme for WeCURE model using $\Gamma$ -convergence. The idea of the proof is sketched as follows. We first fix the bandwidth of the kernel and consider our scheme as an integral scheme of a non-local functional. Then, we reduce the bandwidth of the kernel to zero to show that the non-local functional is a good approximation to the original WeCURE functional. The proof mostly follows the notation and general idea of [45, 46, trillos2018error]. A recent paper [dunlop2019large] also established a $\Gamma$ -convergence proof of the Biharmonic equation. The difference between their paper and ours is mainly the setting of the problem. In their paper, labeled data is considered as the boundary condition, while in our paper, we also consider the labeled data as samples from the data distribution and the rate of the number of labeled and unlabeled data is a fixed factor. In this setting, we will show that weights of WeCURE are crucial to achieving convergence.

Let $P=\{x_{1},x_{2},\cdots,x_{n}\}$ and $x_{i}(1\leq i\leq n)$ be uniformly sampled from $\Omega$ , where $\Omega$ is an open bounded domain in $\mathbb{R}^{d}$ . Let $\{x_{i_{1}},x_{i_{2}},\cdots,x_{i_{m}}\}$ be the set of labeled points where $x_{i_{j}}(1\leq j\leq m)$ is uniformly sampled from $P$ . In this paper, we consider the ratio $\gamma=\frac{n}{m}$ to be fixed. Let $b:\Omega\rightarrow\mathbb{R}$ be a function whose value is only known at the labeled points. The empirical measure of data points is given by $\mu_{n}=\frac{1}{n}\sum_{i=1}^{n}\delta_{x_{i}}$ . We consider a graph with vertices $V=P$ and denote the weights of the edges as $W_{ij}=\eta_{\varepsilon}(x_{i}-x_{j})$ where $\eta_{\varepsilon}(x):=\eta_{\varepsilon}(|x|)=\frac{1}{\varepsilon^{d}}\eta(\frac{|x|}{\varepsilon}),\ \eta:[0,+\infty)\rightarrow[0,+\infty)$ is a radially symmetric function which satisfies the following assumptions:

(A1) $\eta(0)>0$ and $\eta$ is continuous at 0.

(A2) $\eta$ is non-increasing.

(A3) $\eta$ has compact support. If $|r|>\alpha$ , then $\eta(r)=0$ .

The discrete WeCURE model is given by (the weight is $\gamma-1$ , not $\gamma$ in previous sections)

[TABLE]

The continuum nonlocal WeCURE model is given by

[TABLE]

The continuum (local) WeCURE model is given by

[TABLE]

where $\sigma_{\eta}=\frac{1}{2}\int_{\mathbb{R}^{d}}\eta(h)|h_{1}|^{2}dh$ , $h_{1}$ is the first coordinate of vector $h$ .

Remark 5.1.

*The models introduced above contain the corresponding CURE models as special cases if we simply modify some coefficients in the WeCURE models and replace the term $\int_{\Omega}|\nabla(u(x)-b(x))|^{2}dx$ by $\int_{\Omega}|\nabla(u(x)-c\cdot b(x))|^{2}dx$ ( $c\neq 1$ is a certain constant). *

We are now ready to present the main theorems of this section.

Theorem 5.2.

*Let $\Omega\subset\mathbb{R}^{d}$ , $d\geq 2$ be an open, bounded, connected set with Lipschitz boundary. Let $x_{1},\cdots,x_{n},\cdots$ be a sequence of i.i.d random points uniformly sampled from $\Omega$ . $S=\{x_{i_{1}},x_{i_{2}},\cdots,x_{i_{m}}:x_{i_{j}}\text{uniformly sampled from }x_{1},\cdots,x_{n},\cdots\}$ is the set of labeled points whose value is given by $b(x_{i_{j}})$ . Assume the kernel $\eta$ satisfies conditions (A1)-(A3). Then $WeCURE_{n,\varepsilon}$ $\Gamma$ -converges to $WeCURE_{\varepsilon}$ as $n\rightarrow\infty$ in the $TL^{2}$ sense. *

Theorem 5.3.

*Under the assumptions of Theorem 5.2, $WeCURE_{\varepsilon}$ $\Gamma$ -converges to $WeCURE$ as $\varepsilon\rightarrow 0$ in $H_{0}^{2}(\Omega)$ with $L^{2}(\Omega)$ metric. *

Theorem 5.4.

*(Compactness)Under the assumptions of Theorem 5.2, $\{WeCURE_{\varepsilon}\}_{\varepsilon>0}$ satisfies the compactness property with respect to the $L^{2}(\Omega)$ metric. *

The complete proof of Theorem 5.2 and Theorem 5.3 can be found in Sections A.4 and A.5 and Theorem 5.4 is a direct consequence of [bourgain2001another, Theorem 4].

6 Conclusion and Future Work

In this paper, we proposed to use both low dimensionality and smoothness of the underlying data manifold as a regularizer for missing data recovery. For that, we introduced curvature regularization (CURE) and a weighted version of it (WeCURE). Comparing to related models such as LDMM, WNLL, and WNTV, the new regularization was proven more effective for semi-supervised learning and image inpainting on some datasets.

There are plenty of future directions worth exploring. For modelling, a natural question is whether different curvatures can also serve as good smoothing regularizers regularizer for data manifolds and how are they different from the one we chose for CURE? Can these curvatures be easily computed? How does CURE work for other tasks of missing data recovery? Furthermore, convergence analysis of solving the Biharmonic equation (5) on manifold also needs to be studied. Due to a lack of understanding of the numerical methods for the Biharmonic equation, it prohibited us from generalizing CURE to generic inverse problems.

Acknowledgments

Bin Dong is supported in part by NSFC 11671022 and Beijing Natural Science Foundation (Z180001). Haocheng Ju is supported by the Elite Undergraduate Training Program of the School of Mathematical Sciences at Peking University. Zuoqiang Shi is supported by NSFC 11671005. We would also like to thank Dr. Wei Zhu for his valuable comments and kindly sharing the codes of both LDMM and LDMM+WNLL for comparisons.

Appendix A Preliminaries

In this section we present a brief review of some basic concepts used in the asymptotic analysis. The interested readers should consult[45] for a more detailed introduction to these concepts.

A.1 Optimal transport

$\Omega$ is an open and bounded domain in $\mathbb{R}^{d}$ . $\mathscr{B}(\Omega)$ is the Borel $\sigma$ -algebra of $\Omega$ and $\mathscr{P}(\Omega)$ is the set of all Borel probability measures on $\Omega$ . Given $1\leq p<\infty$ , the $p-OT$ distance between $\mu,\hat{\mu}\in\mathscr{P}(\Omega)$ is defined by:

[TABLE]

where $\Gamma(\mu,\hat{\mu})$ is the set of all Borel probability measures on $\Omega\times\Omega$ for which the marginal on the first variable is $\mu$ and the marginal on the second variable is $\hat{\mu}$ . The elements $\pi\in\Gamma(\mu,\hat{\mu})$ are also referred as transportation plans between $\mu$ and $\hat{\mu}$ . When $p=\infty$

[TABLE]

defines a metric on $\mathscr{P}(\Omega)$ , which is called the $\infty$ -transportation distance.

Given a Borel map $T:\Omega\rightarrow\Omega$ and $\mu\in\mathscr{P}(\Omega)$ the push-forward of $\mu$ by $T$ , denoted by $T_{\sharp}\mu\in\mathscr{P}(\Omega)$ is given by:

[TABLE]

Then for any bounded Borel function $\varphi:\Omega\rightarrow\mathbb{R}$ the following change of variables in the integral holds:

[TABLE]

When the measure $\mu\in\mathscr{P}(\Omega)$ is absolutely continuous with respect to the Lebesgue measure, (13) is equivalent to:

[TABLE]

A.2 The $TL^{p}$ Space

The $TL^{p}$ space was introduced in[45] to compare functions defined on $\Omega_{n}=\{x_{i}:i=1,\cdots,n\}$ and an open domain $\Omega$ .

[TABLE]

The metric on the space is

[TABLE]

where $\Gamma(\mu,\nu)$ the set of transportation plans defined in the previous subsection. When the measure $\mu\in\mathscr{P}(\Omega)$ is absolutely continuous with respect to the Lebesgue measure, (19) is equivalent to:

[TABLE]

A.3 $\Gamma$ -Convergence

We follow the definition of $\Gamma$ -convergence by [slepcev2019analysis] in a random setting.

Definition A.1.

*Let $(Z,d)$ be a metric space and $(\mathcal{X},\mathbb{P})$ be a probability space. For each $\omega\in\mathcal{X}$ the functional $E_{n}^{(\omega)}:Z\rightarrow R\cup\{\pm\infty\}$ is a random variable. We say $E_{n}^{(\omega)}$ $\Gamma$ -converge almost surely on the domain $Z$ to $E_{\infty}:Z\rightarrow R\cup\{\pm\infty\}$ with respect to $d$ , and write $E_{\infty}=\Gamma-\text{lim}_{n\rightarrow\infty}E_{n}^{(\omega)}$ , if there exists a set $\mathcal{X^{\prime}}\subset\mathcal{X}$ with $\mathbb{P}(\mathcal{X^{\prime}})=1$ , such that for all $\omega\in\mathcal{X^{\prime}}$ and all $f\in Z$ :

(i)(liminf inequality) for every sequence $\{f_{n}\}_{n=1}^{\infty}$ converging to $f$

[TABLE]

*(ii)(limsup inequality) there exists a sequence $\{f_{n}\}_{n=1}^{\infty}$ converging to $f$ such that

[TABLE]

Definition A.2.

We say that the sequence of nonnegative functionals $\{F_{n}\}_{n\in\mathbb{N}}$ satisfies the compactness property if the following holds: Given $\{n_{k}\}_{k\in\mathbb{N}}$ an increasing sequence of natural numbers and $\{x_{k}\}_{k\in\mathbb{N}}$ a bounded sequence in $X$ for which

[TABLE]

$\{x_{k}\}$ * is relatively compact in $X$ . *

A.4 Proof of Theorem 5.2

A.4.1 Liminf inequality

Proof A.3.

Assume that $u_{n}\xrightarrow{TL^{2}}u$ as $n\rightarrow\infty$ . First we show that

[TABLE]

Since $T_{\sharp}\nu=\nu_{n}$ , using the change of variables(16) it follows that

[TABLE]

Notice that

[TABLE]

Moreover, we have

[TABLE]

and

[TABLE]

Note that $u_{n}\xrightarrow{TL^{2}}u$ indicates $u_{n}\circ T_{n}\xrightarrow{L^{2}(\Omega)}u$ , so the first two terms go to zero as $n\rightarrow\infty$ . We only have to show

[TABLE]

Note that for almost every $(x,y)\in\Omega\times\Omega$

[TABLE]

along with the monotonicity of $\eta_{\varepsilon}$ , we have

[TABLE]

Note that from Theorem 2.5 in [45], we have

[TABLE]

along with the standard result in real analysis that if $f\in L^{p}(\mathbb{R}^{d})$ , then $\lim\limits_{h\rightarrow 0}\int_{\mathbb{R}^{d}}|f(r+h)-f(r)|^{p}dr=0$ , we have

[TABLE]

Similarly, we can show that

[TABLE]

and we obtain(22) and $\lim\limits_{n\rightarrow\infty}a_{n}=0$ , along with

[TABLE]

we have

[TABLE]

The rest terms can be proved in a similar way and we have

[TABLE]

A.4.2 Limsup inequality

Proof A.4.

Define $u_{n}$ to be the restriction of $u$ to the first $n$ data points $X_{1},\cdots,X_{n}$ , and we have $u_{n}\xrightarrow{TL^{2}}u$ . From the proof of the liminf inequality in the previous section, we have

[TABLE]

A.5 Proof of Theorem 5.3

A.5.1 Liminf inequality

Proof A.5.

Consider an arbitrary $u\in H_{0}^{2}(\Omega)$ and suppose that $u_{\varepsilon}\xrightarrow{L^{2}(\Omega)}u$ as $\varepsilon\rightarrow 0$

[TABLE]

The inequality

[TABLE]

follows from the proof of Theorem 8 in[ponce2004new]. Next we show that

[TABLE]

We need the following lemma to establish the liminf inequality.

Lemma A.6.

Let $\Omega$ be a bounded open subset of $\mathbb{R}^{d}$ , $\Omega^{\prime}$ is a open set compactly contained in $\Omega$ . Suppose that $\{u_{\varepsilon}\}_{\varepsilon>0}$ is a sequence of $C^{4}$ functions such that

[TABLE]

if $\Delta u_{\varepsilon}\xrightarrow{L^{2}(\Omega)}\Delta u$ for some $u\in C^{4}(\mathbb{R}^{d})$ , then

[TABLE]

*where $\sigma_{\eta}=\frac{1}{2}\int_{\mathbb{R}^{d}}\eta(h)|h_{1}|^{2}dh$ , $h_{1}$ is the first coordinate of vector $h$ .

Proof A.7.

We claim that

[TABLE]

Using a simple change of variables $h=\frac{y-x}{\varepsilon}$ , we have

[TABLE]

*The second equality follows from that $\Omega^{\prime}$ is compactly contained in $\Omega$ . The third equality follows from fourth order Taylor expansion and the vanishing of first and third order term is a direct result from the radial symmetry of $\eta$ . Combined with (33), we have (35). Note that $\Delta u_{\varepsilon}\xrightarrow{L^{2}(\Omega)}\Delta u$ implies $\left\|\Delta u_{\varepsilon}\right\|_{L^{2}(\Omega^{\prime})}^{2}\rightarrow\left\|\Delta u\right\|_{L^{2}(\Omega^{\prime})}^{2}$ using Hölder inequality. Taking $\varepsilon$ to zero in the right hand side of (35) we have (34). *

*We can proceed to the proof of Liminf equality of Theorem 2.2. Our main idea follows from [45]. Consider an arbitrary $u\in H_{0}^{2}(\Omega)$ and suppose that $u_{\varepsilon}\xrightarrow{L^{2}(\Omega)}u$ as $\varepsilon\rightarrow 0$ . We want to show that $\liminf\limits_{\varepsilon\rightarrow 0}WeCURE_{\varepsilon}(u_{\varepsilon})\geq\sigma_{\eta}WeCURE(u)$ . Without loss of generality, we assume that $\{WeCURE_{\varepsilon}(u_{\varepsilon})\eqref{n3}\}_{\varepsilon>0}$ is uniformly bounded.

Consider $J:\mathbb{R}^{d}\rightarrow[0,\infty)$ a standard mollifier. $J$ is a smooth radially symmetric function, supported in the closed unit ball $\overline{B(0,1)}$ and is such that $\int_{\mathbb{R}^{d}}J(z)dz=1$ . We define $J_{\delta}(z)=\frac{1}{\delta^{d}}J(\frac{z}{\delta})$ .

Fix $\Omega^{\prime}$ an open domain compactly contained in $\Omega$ . Let $\delta^{\prime}=dist\{\Omega^{\prime},\partial\Omega\}$ . Set $\Omega^{\prime\prime}=\{x\in\Omega:dist(x,\partial\Omega)>\frac{\delta^{\prime}}{2}\}$ . $\Omega^{\prime}\subset\subset\Omega^{\prime\prime}\subset\subset\Omega$ . For $0<\delta<\frac{\delta^{\prime}}{2}$ and for a given function $v\in L^{2}(\Omega)$ we define the mollified function $v_{\delta}\in L^{1}(\mathbb{R}^{d})$ by setting $v_{\delta}(x)=\int_{\mathbb{R}^{d}}J_{\delta}(x-z)v(z)dz=\int_{\mathbb{R}^{d}}J(z)v(x-z)dz$ . The functions $v_{\delta}$ are smooth and satisfy $v_{\delta}\xrightarrow{L^{2}(\Omega^{\prime})}v$ as $\delta\rightarrow 0$ . Furthermore*

[TABLE]

By taking the second derivative, it follows that there is a constant $C>0$ (only depending on the mollifier $J$ ) such that

[TABLE]

Since $u_{\varepsilon}\xrightarrow{L^{2}(\Omega)}u$ as $\varepsilon\rightarrow 0$ the norms $\left\|u_{\varepsilon}\right\|_{L^{2}(\Omega)}$ are uniformly bounded. Therefore, taking $v=u_{\varepsilon}$ in the inequalities(37) and setting $u_{\varepsilon,\delta}=(u_{\varepsilon})_{\delta}$ , implies

[TABLE]

Moreover, using (36) to express $D^{2}u_{\varepsilon,\delta}$ and $D^{2}u_{\delta}$ , it is straightforward to deduce that

[TABLE]

*for some constant $C$ independent of $\varepsilon$ . In particular, $\int_{\Omega^{\prime}}(\Delta(u_{\varepsilon,\delta}-u_{\delta}))^{2}dx\rightarrow 0$ as $\varepsilon\rightarrow 0$ and hence we can apply LemmaA.6 to infer that

[TABLE]

The second inequality is obtained by using the change of variables, $\hat{x}=x+z$ and $\Omega^{\prime}$ is contained in the transformed domain. The third inequality follows from Cauchy-Schwarz inequality. Using a change of variables $\hat{y}=y+z$ , we have the third equality. The fourth equality follows from that $\eta$ has compact support, $|z|\leq\delta<\frac{\delta^{\prime}}{2}$ and thus the integral on $\Omega^{\prime\prime}$ is the same as the integral on $\Omega+\{z\}$ . Let $\varepsilon\rightarrow 0$ and apply (38), we have

[TABLE]

Since $u_{\delta}\xrightarrow{L^{2}(\Omega^{\prime})}u$ as $\varepsilon\rightarrow 0$ and $\int_{\Omega^{\prime}}(\Delta u(x))^{2}dx$ is lower semicontinuous, we have

[TABLE]

Take $\Omega^{\prime}\nearrow\Omega$ and we obtain the desired liminf inequality. Next we show

[TABLE]

As $\{WeCURE_{\varepsilon}(u_{\varepsilon})\eqref{n3}\}_{\varepsilon>0}$ is uniformly bounded, we have

[TABLE]

Using nonlocal Green’s formula in[26], we have

[TABLE]

Substitute $u_{\varepsilon}-b$ into (31), we have

[TABLE]

Let $\varepsilon\rightarrow 0$ , it’s straightforward to show

[TABLE]

Summing up $\eqref{ine1},\eqref{ine2},\eqref{ine3},\eqref{ine4}$ , we have

[TABLE]

A.5.2 Limsup inequality

Proof A.8.

From Remark 2.7 in[45], we only have to prove the limsup inequality for $u\in C_{c}^{\infty}(\Omega)$ . We want to prove

[TABLE]

The inequality

[TABLE]

follows from the proof of Theorem 8 in[ponce2004new]. Next we show

[TABLE]

Let $\Omega_{\varepsilon}=\{x\in\Omega:dist(x,\partial\Omega)>\alpha\varepsilon\}$ .

[TABLE]

The first equality is obtained by setting $F(t)=v_{k}(x+t(y-x))-v_{k}(x),F(1)-F(0)=\int_{0}^{1}\int_{0}^{t}F^{\prime\prime}(p)dpdt+f^{\prime}(0)$ and the vanishing of first order term is a direct result from the radial symmetry of $\eta$ . $\nabla^{2}$ stands for the Hessian matrix. The first inequality is obtained by a change of variables $(y,x)\rightarrow(h,z),h=\frac{y-x}{\varepsilon},z=x+t(y-x)$ and the transformed domain is contained in $\Omega$ . As $u$ is compactly supported, it’s straightforward to show that

[TABLE]

then we have

[TABLE]

Similar to the proof of inequality(42), we have

[TABLE]

Let $\varepsilon\rightarrow 0$ , it’s straightforward to show

[TABLE]

Summing up $\eqref{ineq1},\eqref{ineq2},\eqref{ineq3},\eqref{ineq4}$ , we have

[TABLE]

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. Agarwal, K. Branson, and S. Belongie , Higher order learning with graphs , in Proceedings of the 23rd international conference on Machine learning, ACM, 2006, pp. 17–24.
2[2] C. Bao, B. Dong, L. Hou, Z. Shen, X. Zhang, and X. Zhang , Image restoration by minimizing zero norm of wavelet frame coefficients , Inverse problems, 32 (2016), p. 115004.
3[3] A. L. Bertozzi and A. Flenner , Diffuse interface models on graphs for classification of high dimensional data , Multiscale Modeling & Simulation, 10 (2012), pp. 1090–1118.
4[4] K. Bredies, K. Kunisch, and T. Pock , Total generalized variation , SIAM Journal on Imaging Sciences, 3 (2010), pp. 492–526.
5[5] A. Buades, B. Coll, and J.-M. Morel , Neighborhood filters and pde’s , Numer. Math, 105, p. 1–34.
6[6] A. Buades, B. Coll, and J.-M. Morel , A review of image denoising algorithms, with a new one. multiscale model , Simul, 4, p. 490–530.
7[7] A. Buades, B. Coll, and J.-M. Morel , A non-local algorithm for image denoising , in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2, IEEE, 2005, pp. 60–65.
8[8] C. Burges, Y. Le Cun, and C. , Cortes. mnist database .

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

CURE: Curvature Regularization For Missing Data Recovery

Abstract

keywords:

1 Introduction

1.1 Higher Order Regularization

1.2 Other Related Works

1.3 Organization of the Paper

2 Curvature Regularization (CURE): Model and General Algorithm

2.1 CURE

2.2 CURE for Missing Data Recovery

3 CURE for Semi-Supervised Learning

4 CURE for Image Inpainting

5 Asymptotic Analysis

Remark 5.1**.**

Theorem 5.2**.**

Theorem 5.3**.**

Theorem 5.4**.**

6 Conclusion and Future Work

Acknowledgments

Appendix A Preliminaries

A.1 Optimal transport

A.2 The TLpTL^{p}TLp Space

A.3 Γ\GammaΓ-Convergence

Definition A.1**.**

Definition A.2**.**

A.4 Proof of Theorem 5.2

A.4.1 Liminf inequality

Proof A.3**.**

A.4.2 Limsup inequality

Proof A.4**.**

A.5 Proof of Theorem 5.3

A.5.1 Liminf inequality

Proof A.5**.**

Lemma A.6**.**

Proof A.7**.**

A.5.2 Limsup inequality

Proof A.8**.**

Remark 5.1.

Theorem 5.2.

Theorem 5.3.

Theorem 5.4.

A.2 The $TL^{p}$ Space

A.3 $\Gamma$ -Convergence

Definition A.1.

Definition A.2.

Proof A.3.

Proof A.4.

Proof A.5.

Lemma A.6.

Proof A.7.

Proof A.8.