A Generalized Framework for Edge-preserving and Structure-preserving   Image Smoothing

Wei Liu; Pingping Zhang; Yinjie Lei; Xiaolin Huang; Jie Yang; Ian; Reid

arXiv:1907.09642·cs.GR·November 28, 2019

A Generalized Framework for Edge-preserving and Structure-preserving Image Smoothing

Wei Liu, Pingping Zhang, Yinjie Lei, Xiaolin Huang, Jie Yang, Ian, Reid

PDF

Open Access 1 Repo

TL;DR

This paper introduces a versatile, non-convex optimization framework utilizing a truncated Huber penalty for diverse image smoothing tasks, outperforming existing methods with guaranteed convergence.

Contribution

It presents a novel non-convex, non-smooth optimization framework with a truncated Huber penalty for flexible, structure-preserving image smoothing, along with an efficient, convergent numerical solution.

Findings

01

Outperforms state-of-the-art smoothing methods in various tasks

02

Capable of achieving contradictory smoothing behaviors

03

Provides a theoretically guaranteed convergence proof

Abstract

Image smoothing is a fundamental procedure in applications of both computer vision and graphics. The required smoothing properties can be different or even contradictive among different tasks. Nevertheless, the inherent smoothing nature of one smoothing operator is usually fixed and thus cannot meet the various requirements of different applications. In this paper, a non-convex non-smooth optimization framework is proposed to achieve diverse smoothing natures where even contradictive smoothing behaviors can be achieved. To this end, we first introduce the truncated Huber penalty function which has seldom been used in image smoothing. A robust framework is then proposed. When combined with the strong flexibility of the truncated Huber penalty function, our framework is capable of a range of applications and can outperform the state-of-the-art approaches in several tasks. In addition, an…

Tables2

Table 1. Table 1: Quantitative comparison on the noisy simulated ToF data. Results are evaluated in MAE. The best results are in bold . The second best results are underlined .

	Art				Book				Dolls				Laundry				Moebius				Reindeer
	$2 \times$	$4 \times$	$8 \times$	$16 \times$	$2 \times$	$4 \times$	$8 \times$	$16 \times$	$2 \times$	$4 \times$	$8 \times$	$16 \times$	$2 \times$	$4 \times$	$8 \times$	$16 \times$	$2 \times$	$4 \times$	$8 \times$	$16 \times$	$2 \times$	$4 \times$	$8 \times$	$16 \times$
TGV(?)	0.8	1.21	2.01	4.59	0.61	0.88	1.21	2.19	0.66	0.95	1.38	2.88	0.61	0.87	1.36	3.06	0.57	0.77	1.23	2.74	0.61	0.85	1.3	3.41
AR(?)	1.17	1.7	2.93	5.32	0.98	1.22	1.74	2.89	0.97	1.21	1.71	2.74	1	1.31	1.97	3.43	0.95	1.2	1.79	2.82	1.07	1.3	2.03	3.34
SG-WLS(?)	1.26	1.9	3.07	-	0.82	1.12	1.73	-	0.87	1.11	1.81	-	0.86	1.17	2	-	0.82	1.08	1.79	-	0.9	1.32	2.01	-
FGI(?)	0.9	1.37	2.46	4.89	0.66	0.85	1.23	1.96	0.74	0.95	1.41	2.13	0.71	0.99	1.59	2.67	0.67	0.82	1.2	1.87	0.75	0.94	1.55	2.73
SGF(?)	1.42	1.85	3.06	5.55	0.84	1.11	1.76	3.03	0.87	1.2	1.88	3.26	0.74	1.1	1.96	3.63	0.81	1.13	1.84	3.16	0.93	1.25	2.03	3.67
SD Filter(?)	1.16	1.64	2.88	5.52	0.86	1.1	1.57	2.68	1.04	1.27	1.73	2.76	0.96	1.25	1.94	3.54	0.93	1.14	1.68	2.75	1.05	1.31	1.99	3.43
FBS(?)	1.93	2.39	3.29	5.05	1.42	1.55	1.76	2.48	1.33	1.45	1.69	2.26	1.32	1.49	1.77	2.67	1.16	1.29	1.61	2.44	1.63	1.76	2.01	2.69
muGIF(?)	1.00	1.26	2.00	3.46	0.73	0.89	1.35	2.15	0.85	1.04	1.50	2.45	0.64	0.87	1.36	2.57	0.67	0.85	1.35	2.25	0.78	0.94	1.39	2.52
Park et al.(?)	1.66	2.47	3.44	5.55	1.19	1.47	2.06	3.1	1.19	1.56	2.15	3.04	1.34	1.73	2.41	3.85	1.2	1.5	2.13	2.95	1.26	1.65	2.46	3.66
Shen et al.(?)	1.79	2.21	3.2	5.04	1.34	1.69	2.25	3.13	1.37	1.58	2.05	2.85	1.49	1.74	2.34	3.5	1.34	1.56	2.09	2.99	1.29	1.55	2.19	3.33
Gu et al.(?)	0.61	1.46	2.98	5.09	0.52	0.95	1.87	2.98	0.63	1.02	1.89	2.92	0.58	1.14	2.21	3.58	0.53	0.96	1.89	2.99	0.52	1.07	2.17	3.59
Li et al.(?)	-	3.77	4.49	6.29	-	3.21	3.28	3.79	-	3.19	3.28	3.79	-	3.34	3.61	4.45	-	3.23	3.35	3.92	-	3.39	3.65	4.54
Ours	0.69	1.07	1.65	2.96	0.55	0.81	1.22	1.78	0.62	0.9	1.27	1.84	0.61	0.89	1.28	2.12	0.51	0.75	1.12	1.71	0.56	0.87	1.27	2.08

Table 2. Table 2: Quantitative comparison on real ToF dataset. The errors are calculated as MAE to the measured ground-truth in mm . The best results are in bold . The second best results are underlined .

	Books	Devil	Shark
Bicubic	16.23mm	17.78mm	16.66mm
GF(?)	15.55mm	16.1mm	17.1mm
SD Filter(?)	13.47mm	15.99mm	16.18mm
SG-WLS(?)	14.71mm	16.24mm	16.51mm
Shen et al.(?)	15.47mm	16.18mm	17.33mm
Park et al.(?)	14.31mm	15.36mm	15.88mm
TGV(?)	12.8mm	14.97mm	15.53mm
AR(?)	14.37mm	15.41mm	16.27mm
Gu et al.(?)	13.87mm	15.36mm	15.88mm
SGF(?)	13.57mm	15.74mm	16.21mm
FGI(?)	14.21mm	16.43mm	16.37mm
FBS(?)	15.93mm	17.21mm	16.33mm
Li et al.(?)	14.33mm	15.09mm	15.82mm
Ours	12.49mm	14.51mm	15.02mm

Equations36

\displaystyle{h_{T}(x)=\left\{\begin{array}[]{l}h(x),\ \ \ \ \ |x|\leq b\\ b-\frac{a}{2},\ \ \ |x|>b\end{array}\right.\text{s.t.}\ \ \ a\leq b,}

\displaystyle{h_{T}(x)=\left\{\begin{array}[]{l}h(x),\ \ \ \ \ |x|\leq b\\ b-\frac{a}{2},\ \ \ |x|>b\end{array}\right.\text{s.t.}\ \ \ a\leq b,}

\displaystyle{h(x)=\left\{\begin{array}[]{l}\frac{1}{2a}x^{2},\ \ \ \ \ \ |x|<a\\ |x|-\frac{a}{2},\ \ |x|\geq a\end{array}\right.,}

\displaystyle{h(x)=\left\{\begin{array}[]{l}\frac{1}{2a}x^{2},\ \ \ \ \ \ |x|<a\\ |x|-\frac{a}{2},\ \ |x|\geq a\end{array}\right.,}

E_{u} (u) = i \sum j \in N_{d} (i) \sum h_{T} (u_{i} - f_{j}) + λ j \in N_{s} (i) \sum ω_{i, j} h_{T} (u_{i} - u_{j}),

E_{u} (u) = i \sum j \in N_{d} (i) \sum h_{T} (u_{i} - f_{j}) + λ j \in N_{s} (i) \sum ω_{i, j} h_{T} (u_{i} - u_{j}),

ω_{i, j} = \frac{1}{( ∣ g _{i} - g _{j} ∣ + δ ) ^{α}},

ω_{i, j} = \frac{1}{( ∣ g _{i} - g _{j} ∣ + δ ) ^{α}},

h_{T} (\nabla_{i, j}^{*}) = l_{i, j}^{*} min {h (\nabla_{i, j}^{*} - l_{i, j}^{*}) + (b_{*} - \frac{a _{*}}{2}) ∣ l_{i, j}^{*} ∣_{0}},

h_{T} (\nabla_{i, j}^{*}) = l_{i, j}^{*} min {h (\nabla_{i, j}^{*} - l_{i, j}^{*}) + (b_{*} - \frac{a _{*}}{2}) ∣ l_{i, j}^{*} ∣_{0}},

\displaystyle{l^{\ast}_{i,j}=\left\{\begin{array}[]{l}0,\ \ \ \ \ \ \ \ |\nabla^{\ast}_{i,j}|\leq b_{\ast}\\ \nabla^{\ast}_{i,j},\ \ |\nabla^{\ast}_{i,j}|>b_{\ast}\end{array}\right.,\ \ \ast\in\{d,s\}.}

\displaystyle{l^{\ast}_{i,j}=\left\{\begin{array}[]{l}0,\ \ \ \ \ \ \ \ |\nabla^{\ast}_{i,j}|\leq b_{\ast}\\ \nabla^{\ast}_{i,j},\ \ |\nabla^{\ast}_{i,j}|>b_{\ast}\end{array}\right.,\ \ \ast\in\{d,s\}.}

{\begin{array}[]{r}E_{ul}(u,l^{d},l^{s})=\sum\limits_{i,j}\left(h(\nabla^{d}_{i,j}-l^{d}_{i,j})+(b_{d}-\frac{a_{d}}{2})|l^{d}_{i,j}|_{0}\right)\\ \ \ \ \ \ \ +\lambda\sum\limits_{i,j}\omega_{i,j}\left(h(\nabla^{s}_{i,j}-l^{s}_{i,j})+(b_{s}-\frac{a_{s}}{2})|l^{s}_{i,j}|_{0}\right)\end{array}}.

{\begin{array}[]{r}E_{ul}(u,l^{d},l^{s})=\sum\limits_{i,j}\left(h(\nabla^{d}_{i,j}-l^{d}_{i,j})+(b_{d}-\frac{a_{d}}{2})|l^{d}_{i,j}|_{0}\right)\\ \ \ \ \ \ \ +\lambda\sum\limits_{i,j}\omega_{i,j}\left(h(\nabla^{s}_{i,j}-l^{s}_{i,j})+(b_{s}-\frac{a_{s}}{2})|l^{s}_{i,j}|_{0}\right)\end{array}}.

E_{u} (u) = l^{*} min E_{u l} (u, l^{d}, l^{s}), * \in {d, s} .

E_{u} (u) = l^{*} min E_{u l} (u, l^{d}, l^{s}), * \in {d, s} .

h (\nabla_{i, j}^{*} - l_{i, j}^{*}) = μ_{i, j}^{*} min {μ_{i, j}^{*} (\nabla_{i, j}^{*} - l_{i, j}^{*})^{2} + ψ (μ_{i, j}^{*})},

h (\nabla_{i, j}^{*} - l_{i, j}^{*}) = μ_{i, j}^{*} min {μ_{i, j}^{*} (\nabla_{i, j}^{*} - l_{i, j}^{*})^{2} + ψ (μ_{i, j}^{*})},

\displaystyle{\mu^{\ast}_{i,j}=\left\{\begin{array}[]{l}\frac{1}{2a_{\ast}},\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\nabla^{\ast}_{i,j}-l^{\ast}_{i,j}|<a_{\ast}\\ \frac{1}{2|\nabla^{\ast}_{i,j}-l^{\ast}_{i,j}|},\ \ \ \ |\nabla^{\ast}_{i,j}-l^{\ast}_{i,j}|\geq a_{\ast}\end{array}\right.,\ \ \ast\in\{d,s\}.}

\displaystyle{\mu^{\ast}_{i,j}=\left\{\begin{array}[]{l}\frac{1}{2a_{\ast}},\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\nabla^{\ast}_{i,j}-l^{\ast}_{i,j}|<a_{\ast}\\ \frac{1}{2|\nabla^{\ast}_{i,j}-l^{\ast}_{i,j}|},\ \ \ \ |\nabla^{\ast}_{i,j}-l^{\ast}_{i,j}|\geq a_{\ast}\end{array}\right.,\ \ \ast\in\{d,s\}.}

{\begin{array}[]{l}E_{ul\mu}(u,l^{d},l^{s},\mu^{d},\mu^{s})=\\ \ \ \ \ \ \ \ \ \sum\limits_{i,j}\left(\mu^{d}_{i,j}(\nabla^{d}_{i,j}-l^{d}_{i,j})^{2}+\psi(\mu^{d}_{i,j})+(b_{d}-\frac{a_{d}}{2})|l^{d}_{i,j}|_{0}\right)+\\ \ \lambda\sum\limits_{i,j}\omega_{i,j}\left(\mu^{s}_{i,j}(\nabla^{s}_{i,j}-l^{s}_{i,j})^{2}+\psi(\mu^{s}_{i,j})+(b_{s}-\frac{a_{s}}{2})|l^{s}_{i,j}|_{0}\right).\end{array}}

{\begin{array}[]{l}E_{ul\mu}(u,l^{d},l^{s},\mu^{d},\mu^{s})=\\ \ \ \ \ \ \ \ \ \sum\limits_{i,j}\left(\mu^{d}_{i,j}(\nabla^{d}_{i,j}-l^{d}_{i,j})^{2}+\psi(\mu^{d}_{i,j})+(b_{d}-\frac{a_{d}}{2})|l^{d}_{i,j}|_{0}\right)+\\ \ \lambda\sum\limits_{i,j}\omega_{i,j}\left(\mu^{s}_{i,j}(\nabla^{s}_{i,j}-l^{s}_{i,j})^{2}+\psi(\mu^{s}_{i,j})+(b_{s}-\frac{a_{s}}{2})|l^{s}_{i,j}|_{0}\right).\end{array}}

E_{u l} (u, l^{*}) = μ^{*} min E_{u l μ} (u, l^{*}, μ^{*}), * \in {d, s} .

E_{u l} (u, l^{*}) = μ^{*} min E_{u l μ} (u, l^{*}, μ^{*}), * \in {d, s} .

u^{k + 1} = u min E_{u l μ} (u, (l^{*})^{k}, (μ^{*})^{k}),

u^{k + 1} = u min E_{u l μ} (u, (l^{*})^{k}, (μ^{*})^{k}),

u^{k + 1} = (A^{k} - 2 λ W^{k})^{- 1} (D^{k} + 2 λ S^{k}),

u^{k + 1} = (A^{k} - 2 λ W^{k})^{- 1} (D^{k} + 2 λ S^{k}),

E_{u} (u) \leq E_{u l} (u, (l^{*})^{k}), E_{u} (u^{k}) = E_{u l} (u^{k}, (l^{*})^{k}),

E_{u} (u) \leq E_{u l} (u, (l^{*})^{k}), E_{u} (u^{k}) = E_{u l} (u^{k}, (l^{*})^{k}),

\displaystyle{\left\{\begin{array}[]{l}E_{ul}(u,(l^{\ast})^{k})\leq E_{ul\mu}(u,(l^{\ast})^{k},(\mu^{\ast})^{k})\\ E_{ul}(u^{k},(l^{\ast})^{k})=E_{ul\mu}(u^{k},(l^{\ast})^{k},(\mu^{\ast})^{k})\end{array}\right..}

\displaystyle{\left\{\begin{array}[]{l}E_{ul}(u,(l^{\ast})^{k})\leq E_{ul\mu}(u,(l^{\ast})^{k},(\mu^{\ast})^{k})\\ E_{ul}(u^{k},(l^{\ast})^{k})=E_{ul\mu}(u^{k},(l^{\ast})^{k},(\mu^{\ast})^{k})\end{array}\right..}

{\begin{array}[]{l}E_{ul}(u^{k+1},(l^{\ast})^{k})\leq E_{ul\mu}(u^{k+1},(l^{\ast})^{k},(\mu^{\ast})^{k})\\ \leq E_{ul\mu}(u^{k},(l^{\ast})^{k},(\mu^{\ast})^{k})=E_{ul}(u^{k},(l^{\ast})^{k}),\end{array}}

{\begin{array}[]{l}E_{ul}(u^{k+1},(l^{\ast})^{k})\leq E_{ul\mu}(u^{k+1},(l^{\ast})^{k},(\mu^{\ast})^{k})\\ \leq E_{ul\mu}(u^{k},(l^{\ast})^{k},(\mu^{\ast})^{k})=E_{ul}(u^{k},(l^{\ast})^{k}),\end{array}}

E_{u} (u^{k + 1}) \leq E_{u l} (u^{k + 1}, (l^{*})^{k}) \leq E_{u l} (u^{k}, (l^{*})^{k}) = E_{u} (u^{k}),

E_{u} (u^{k + 1}) \leq E_{u l} (u^{k + 1}, (l^{*})^{k}) \leq E_{u l} (u^{k}, (l^{*})^{k}) = E_{u} (u^{k}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wliusjtu/Generalized-Smoothing-Framework
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Enhancement Techniques · Advanced Image Fusion Techniques · Sparse and Compressive Sensing Techniques

Full text

A Generalized Framework for Edge-preserving and Structure-preserving Image Smoothing

Wei Liu1,2, Pingping Zhang3, Yinjie Lei4∗, Xiaolin Huang1,5, Jie Yang1,5, Ian Reid2

1Department of Automation, Shanghai Jiao Tong University, 2The University of Adelaide

3Dalian University of Technology, 4Sichuan University, 5Institute of Medical Robotics, Shanghai Jiao Tong University

{wei.liu02, ian.reid}@adelaide.edu.au, [email protected], [email protected], {xiaolinhuang, jieyang}@sjtu.edu.cn Jie Yang and Yinjie Lei are the corresponding authors of this paper.

Abstract

Image smoothing is a fundamental procedure in applications of both computer vision and graphics. The required smoothing properties can be different or even contradictive among different tasks. Nevertheless, the inherent smoothing nature of one smoothing operator is usually fixed and thus cannot meet the various requirements of different applications. In this paper, a non-convex non-smooth optimization framework is proposed to achieve diverse smoothing natures where even contradictive smoothing behaviors can be achieved. To this end, we first introduce the truncated Huber penalty function which has seldom been used in image smoothing. A robust framework is then proposed. When combined with the strong flexibility of the truncated Huber penalty function, our framework is capable of a range of applications and can outperform the state-of-the-art approaches in several tasks. In addition, an efficient numerical solution is provided and its convergence is theoretically guaranteed even the optimization framework is non-convex and non-smooth. The effectiveness and superior performance of our approach are validated through comprehensive experimental results in a range of applications.

Introduction

The key challenge of many tasks in both computer vision and graphics can be attributed to image smoothing. At the same time, the required smoothing properties can vary dramatically for different tasks. In this paper, depending on the required smoothing properties, we roughly classify a large number of applications into four groups.

Applications in the first group require the smoothing operator to smooth out small details while preserving strong edges, and the amplitudes of these strong edges can be reduced but the edges should be neither blurred nor sharpened. Representatives in this group are image detail enhancement and HDR tone mapping (?; ?; ?). Blurring edges can result in halos while sharpening edges will lead to gradient reversals (?).

The second group includes tasks like clip-art compression artifacts removal (?; ?), image abstraction and pencil sketch production (?). In contrast to the ones in the first group, these tasks require to smooth out small details while sharpening strong edges. This is because edges can be blurred in the compressed clip-art image and they need to be sharpened when the image is recovered (see Fig. 1(b) for example). Sharper edges can produce better visual quality in image abstraction and pencil sketch. At the same time, the amplitudes of strong edges are not allowed to be reduced in these tasks.

Guided image filtering, such as guided depth map upsampling (?; ?; ?) and flash/no flash filtering (?; ?), is categorized into the third group. The structure inconsistency between the guidance image and target image, which can cause blurring edges and texture copy artifacts in the smoothed image (?; ?), should be properly handled by the specially designed smoothing operator. They also need to sharpen edges in the smoothed image due to the reason that low-quality capture of depth and noise in the no flash images can lead to blurred edge (see Fig. 1(c) for example).

Tasks in the fourth group require to smooth the image in a scale-aware manner, e.g., image texture removal (?; ?; ?). This kind of tasks require to smooth out small structures even when they contain strong edges, while large structure should be properly preserved even the edges are weak (see Fig. 1(d) for example). This is totally different from that in the above three groups where they all aim at preserving strong edges.

To be more explicit, we categorize the smoothing procedures in the first to the third groups as edge-preserving image smoothing since they try to preserve salient edges, while the smoothing processes in the fourth group are classified as structure-preserving image smoothing because they aim at preserving salient structures.

A diversity of edge-preserving and structure-preserving smoothing operators have been proposed for various tasks. Generally, each of them is designed to meet the requirements of certain applications, and thus its inherent smoothing nature is usually fixed. Therefore, there is seldom a smoothing operator that can meet all the smoothing requirements of the above four groups, which are quite different or even contradictive. For example, the $L_{0}$ norm smoothing (?) can sharpen strong edges and is suitable for clip-art compression artifacts removal, however, this will lead to gradient reversals in image detail enhancement and HDR tone mapping. The weighted least squares (WLS) smoothing (?) performs well in image detail enhancement and HDR tone mapping, but it is not capable of sharpening edges and structure-preserving smoothing, etc.

In contrast to most of the smoothing operators in the literature, a new smoothing operator, which is based on a non-convex non-smooth optimization framework, is proposed in this paper. It can achieve different and even contradictive smoothing behaviors and is able to handle the applications in the four groups mentioned above. The main contributions of this paper are as follows:

We introduce the truncated Huber penalty function which has seldom been used in image smoothing. By varying the parameters, it shows strong flexibility.

2.

A robust non-convex non-smooth optimization framework is proposed. When combined with the strong flexibility of the truncated Huber penalty function, our model can achieve various and even contradictive smoothing behaviors. We show that it is able to handle the tasks in the four groups mentioned above. This has seldom been achieved by previous smoothing operators.

3.

An efficient numerical solution to the proposed optimization framework is provided. Its convergence is theoretically guaranteed.

4.

Our method is able to outperform the specially designed approaches in many tasks and state-of-the-art performance is achieved.

Related Work

Tremendous smoothing operators have been proposed in recent decades. In terms of edge-preserving smoothing, bilateral filter (BLF) (?) is the early work that has been used in various tasks such as image detail enhancement (?), HDR tone mapping (?), etc. However, it is prone to produce results with gradient reversals and halos (?). Its alternatives (?; ?) also share a similar problem. Guided filter (GF) (?) can produce results free of gradient reversals but halos still exist. The WLS smoothing (?) solves a global optimization problem and performs well in handling these artifacts. The $L_{0}$ norm smoothing is able to eliminate low-amplitude structures while sharpening strong edges, which can be applied to the tasks in the second group. To handle the structure inconsistency problem, Shen et al. (?) proposed to perform mutual-structure joint filtering. They also explored the relation between the guidance image and target image via optimizing a scale map (?), however, additional processing was adopted for structure inconsistency handling. Ham et al. (?) proposed to handle the structure inconsistency by combining a static guidance weight with a Welsch’s penalty (?) regularized smoothness term, which leaded to a static/dynamic (SD) filter. Gu et al. (?) presented a weighted analysis representation model for guided depth map enhancement.

In terms of structure-preserving smoothing, Zhang et al. (?) proposed to smooth structures of different scales with a rolling guidance filter (RGF). Cho et al. (?) modified the original BLF with local patch-based analysis of texture features and obtained a bilateral texture filter (BTF) for image texture removal. Karacan et al. (?) proposed to smooth image textures by making use of region covariances that captured local structure and textural information. Xu et al. (?) adopted the relative total variation (RTV) as a prior to regularize the texture smoothing procedure. Fan et al. (?; ?) proposed to perform various kinds of image smoothing through convolutional neural networks. Chen et al. (?) proved that the TV- $L_{1}$ model (?; ?) could smooth images in a scale-aware manner, and it is thus ideal for structure-preserving smoothing such as image texture removal (?).

Most of the approaches mentioned above are limited to a few applications because their inherent smoothing natures are usually fixed. In contrast, our method proposed in this paper can have strong flexibility in achieving various smoothing behaviors, which enables wider applications of our method than most of them. Moreover, our method can show better performance than these methods in several applications that they are specially designed for.

Our Approach

Truncated Huber Penalty Function

We first introduce the truncated Huber penalty function which is defined as:

[TABLE]

where $a,b$ are constants. $h(\cdot)$ is the Huber penalty function (?) defined as:

[TABLE]

$h_{T}(\cdot)$ and $h(\cdot)$ are plotted in Fig. 2(a) with $a=\epsilon$ which is a sufficient small value (e.g., $\epsilon=10^{-7}$ ). $h(\cdot)$ is an edge-preserving penalty function, but it cannot sharpen edges when adopted to regularize the smoothing procedure. In contrast, $h_{T}(\cdot)$ can sharpen edges because it is able to not penalize image edges due to the truncation. The Welsch’s penalty function (?), which was adopted in the recent proposed SD filter (?), is also plotted in the figure. This penalty function is known to be capable of sharpening edges, which is also because it seldom penalizes strong image edges. The Welsch’s penalty function is close to the $L_{2}$ norm when the input is small, while the $h_{T}(\cdot)$ can be close to the $L_{1}$ norm when $a$ is set sufficient small, which demonstrates $h_{T}(\cdot)$ can better preserve weak edges than the Welsch’s penalty function.

With different parameter settings, $h_{T}(\cdot)$ can show strong flexibility to yield different penalty behaviors. Assume the input intensity values are within $[0,I_{m}]$ , then the amplitude of any edge will fall in $[0,I_{m}]$ . We first set $a=\epsilon$ . Then if we set $b>I_{m}$ , $h_{T}(\cdot)$ will be actually the same as $h(\cdot)$ because the second condition in Eq. (3) can never be met. Because $a$ is sufficient small, $h_{T}(\cdot)$ will be close to the $L_{1}$ norm in this case, and thus it will be an edge-preserving penalty function that does not sharpen edges. Conversely, when we set $b<I_{m}$ , the truncation in $h_{T}(\cdot)$ will be activated. This can lead to having penalization on weak edges without penalizing strong edges, and thus the strong edges are sharpened. To be short, $b$ can act as a switch to decide whether $h_{T}(\cdot)$ can sharpen edges or not. Similarly, by setting $a=b>I_{m}$ and $a=b<I_{m}$ , $h_{T}(\cdot)$ can be easily switched between the $L_{2}$ norm and truncated $L_{2}$ norm. Note that the truncated $L_{2}$ norm is also able to sharpen edges (?). In contrast, the Welsch’s penalty function does not enjoy this kind of flexibility. Different cases of $h_{T}(\cdot)$ are illustrated in Fig. 2(b).

Model

Given an input image $f$ and a guidance image $g$ , the smoothed output image $u$ is the solution to the following objective function:

[TABLE]

where $h_{T}$ is defined in Eq.(3); $N_{d}(i)$ is the $(2r_{d}+1)\times(2r_{d}+1)$ square patch centered at $i$ ; $N_{s}(i)$ is the $(2r_{s}+1)\times(2r_{s}+1)$ square patch centered at $i$ ; $\lambda$ is a parameter that controls the overall smoothing strength. To be clear, we adopt $\{a_{d},b_{d}\}$ and $\{a_{s},b_{s}\}$ to denote the parameters of $h_{T}(\cdot)$ in the data term and smoothness term, respectively. The guidance weight $\omega_{i,j}$ is defined as:

[TABLE]

where $\alpha$ determines the sensitivity to the edges in $g$ which can be the input image, i.e., $g=f$ . $|\cdot|$ represents the absolute value. $\delta$ is a small constant being set as $\delta=10^{-7}$ .

The adoption of $h_{T}(\cdot)$ makes our model in Eq. (7) to enjoy a strong flexibility. As will be shown in the following property analysis section, with different parameter settings, our model is able to achieve different smoothing behaviors, and thus it is capable of various tasks that require either edge-preserving smoothing or structure-preserving smoothing.

Numerical Solution

Our model in Eq. (7) is not only non-convex but also non-smooth, which arises from the adopted $h_{T}(\cdot)$ . Commonly used approaches (?; ?; ?; ?) for solving non-convex optimization problems are not applicable. To tackle this problem, we first rewrite $h_{T}(\cdot)$ in a new equivalent form. By defining $\nabla^{d}_{i,j}=u_{i}-f_{j}$ and $\nabla^{s}_{i,j}=u_{i}-u_{j}$ , we have:

[TABLE]

where $\ast\in\{d,s\}$ , $|l^{\ast}_{i,j}|_{0}$ is the $L_{0}$ norm of $l^{\ast}_{i,j}$ . The minimum of the right side of Eq. (9) is obtained on the condition:

[TABLE]

The detailed proof of Eq. (9) and Eq. (12) is provided in our supplementary file. These two equations also theoretically validate our analysis in Fig. 2(b): we have $|\nabla^{\ast}_{i,j}|\in[0,I_{m}]$ if the intensity values are in $[0,I_{m}]$ . Then if $b>I_{m}$ , based on Eq. (9) and Eq. (12), we will always have $h_{T}(\nabla^{\ast}_{i,j})=h(\nabla^{\ast}_{i,j})$ which means $h_{T}(\cdot)$ degrades to $h(\cdot)$ .

A new energy function is defined as:

[TABLE]

Based on Eq. (9) and Eq. (12), we then have:

[TABLE]

Given Eq. (12) as the optimum condition of Eq. (14) with respect to $l^{\ast}$ , optimizing $E_{ul}(u,l^{d},l^{s})$ with respect to $u$ only involves Huber penalty function $h(\cdot)$ . The problem can thus be optimized through the half-quadratic (HQ) optimization technique (?; ?). More specifically, a variable $\mu^{\ast}(\ast\in\{d,s\})$ and a function $\psi(\mu^{\ast}_{i,j})$ with respect to $\mu^{\ast}$ exist such that:

[TABLE]

where the optimum is yielded on the condition:

[TABLE]

The detailed proof of Eq. (15) and Eq. (18) is provided in our supplementary file. Then we can further define a new energy function:

[TABLE]

Based on Eq. (15) and Eq. (18), we then have:

[TABLE]

Given Eq. (18) as the optimum condition of $\mu^{\ast}$ for Eq. (20), optimizing $E_{ul\mu}(u,l^{d},l^{s},\mu^{d},\mu^{s})$ with respect to $u$ only involves the $L_{2}$ norm penalty function, which has a closed-form solution. However, since the optimum conditions in Eq. (12) and Eq. (18) both involve $u$ , therefore, the final solution $u$ can only be obtained in an iterative manner. Assuming we have got $u^{k}$ , then $(l^{\ast})^{k}$ and $(\mu^{\ast})^{k},(\ast\in\{s,d\})$ can be updated through Eq. (12) and Eq. (18) with $u^{k}$ . Finally, $u^{k+1}$ is obtained with:

[TABLE]

Eq.(21) has a close-form solution as:

[TABLE]

where $\mathcal{W}^{k}$ is an affinity matrix with $\mathcal{W}^{k}_{i,j}=\omega_{i,j}(\mu^{s}_{i,j})^{k}$ , $\mathcal{A}^{k}$ is a diagonal matrix with $\mathcal{A}^{k}_{ii}=\sum_{j\in N_{d}(i)}(\mu^{d}_{i,j})^{k}+2\lambda\sum_{j\in N_{s}(i)}\omega_{i,j}(\mu^{s}_{i,j})^{k}$ , $D^{k}$ is a vector with $D^{k}_{i}=\sum_{j\in N_{d}(i)}(\mu^{d}_{i,j})^{k}(f_{j}+(l^{d}_{i,j})^{k})$ and $S^{k}$ is also a vector with $S^{k}_{i}=\sum_{j\in N_{s}(i)}\omega_{i,j}(\mu^{s}_{i,j})^{k}(l^{s}_{i,j})^{k}$ .

The above optimization procedure monotonically decreases the value of $E_{u}(u)$ in each step, its convergence is theoretically guaranteed. Given $u^{k}$ in the $k$ th iteration and $\ast\in\{s,d\}$ , then for any $u$ , we have:

[TABLE]

Given $(l^{\ast})^{k}$ has been updated through Eq. (12), Eq. (23) is based on Eq. (14) and Eq. (9). After $(\mu^{\ast})^{k}$ has been updated through Eq. (18), Eq. (26) is based on Eq. (20) and Eq. (15). We now have:

[TABLE]

the first and second inequalities follow from Eq. (26) and Eq. (21), respectively. We finally have:

[TABLE]

the first and second inequalities follow from Eq. (23) and Eq. (27), respectively. Since the value of $E_{u}(u)$ is bounded from below, Eq. (28) indicates that the convergence of our iterative scheme is theoretically guaranteed.

The above optimization procedure is iteratively performed $N$ times to get the output $u^{N}$ . In all our experiments, we set $u^{0}=f$ , which is able to produce promising results in each application. Our optimization procedure is summarized in Algorithm 1.

Property Analysis

With different parameter settings, the strong flexibility of $h_{T}(\cdot)$ makes our model able to achieve various smoothing behaviors. First, we show that some classical approaches can be viewed as special cases of our model. For example, by setting $a_{d}=b_{d}>I_{m},a_{s}=\epsilon,b_{s}>I_{m},\alpha=0,r_{d}=0,r_{s}=1$ , our model is an approximation of the TV model (?) which is a representative edge-preserving smoothing operator. If we set $\alpha=0.2,g=f$ with other parameters the same as above, then the first iteration of Algorithm 1 will be the WLS smoothing (?) which performs well in handling gradient reversals and halos in image detail enhancement and HDR tone mapping. With parameters $a_{d}=\epsilon,b_{d}>I_{m},a_{s}=\epsilon,b_{s}>I_{m},\alpha=0,r_{d}=0,r_{s}=1$ , our model can yield very close smoothing natures as the TV- $L_{1}$ model (?) which is classical for structure-preserving smoothing.

For different kinds of applications, our model can produce better results than the special cases mentioned above. To be convenient, we first start with the tasks in the fourth group which require structure-preserving smoothing. For these tasks, the parameters are set as $a_{d}=\epsilon,b_{d}>I_{m},a_{s}=\epsilon,b_{s}>I_{m},r_{d}=r_{s},\alpha=0.5,g=f$ . This parameter setting has the following two advantages: first, the setting $a_{d}=\epsilon,b_{d}>I_{m},a_{s}=\epsilon,b_{s}>I_{m}$ enables our model to have the structure-preserving property similar to that of the TV- $L_{1}$ model; second, the guidance weight with $\alpha=0.5,g=f$ can make our model to obtain sharper edges in the results than the TV- $L_{1}$ model does. We illustrate this with 1D smoothing results in Fig. 3(a) and (b). Fig. 6(b) and (c) further show a comparison of image texture removal results. As shown in the figure, both the TV- $L_{1}$ model and our model can properly remove the small textures, however, edges in our result are much sharper than that in the result of the TV- $L_{1}$ model. The typical values for $r_{d}=r_{s}$ are $1\sim 3$ depending on the texture size. $\lambda$ is usually smaller than 1. Larger $r_{d},r_{s},\lambda$ can lead larger structures to be removed. The iteration number is set as $N=10$ .

When dealing with image detail enhancement and HDR tone mapping in the first group, one way is to set the parameters so that our model can perform WLS smoothing. In contrast, we can further make use of the structure-preserving property of our model to produce better results. The parameters are set as follows: $a_{d}=\epsilon,b_{d}>I_{m},a_{s}=\epsilon,b_{s}>I_{m},r_{d}=r_{s},\alpha=0.2,g=f$ . This kind of parameter setting is based on the following observation in our experiments: when we adopt $N=1$ and set $\lambda$ to a large value, the amplitudes of different structures will decrease at different rates, i.e., the amplitudes of small structures can have a larger decrease than the large ones, as illustrated in Fig. 3(d). At the same time, edges are neither blurred nor sharpened. This kind of smoothing behavior is desirable for image detail enhancement and HDR tone mapping. As a comparison, Fig. 3(c) shows the smoothing result of the WLS smoothing. As can be observed from the figures, our method can better preserve the edges (see the bottom of the 1D signals in Fig. 3(c) and (d)). Fig. 4(b) and (c) further show a comparison of image detail enhancement results. We fix $r_{d}=r_{s}=2$ and vary $\lambda$ to control the smoothing strength. $\lambda$ for the tasks in the first group is usually much larger than that for the ones in the fourth group, for example, the result in Fig. 4(c) is generated with $\lambda=20$ .

To sharpen edges that is required by the tasks in the second and the third groups, we can set $b_{s}<I_{m}$ in the smoothness term. In addition, we further set other parameters as $a_{d}=\epsilon,b_{d}<I_{m},a_{s}=\epsilon$ . The truncation $b_{d}<I_{m}$ in the data term can help our model to be robust against the outliers in the input image, for example, the noise in the no flash image and low-quality depth map. The truncation $b_{s}<I_{m}$ in the smoothness term can enable our model to be an edge-preserving one. By setting $a_{d}=a_{s}=\epsilon$ , our model can further enjoy the structure-preserving property. With both edge-preserving and structure-preserving smoothing natures, our model has the ability to preserve large structures with weak edges and small structures with strong edges at the same time, which is challenging but is of practical importance. Fig. 5(a) illustrates this kind of case with an example of clip-art compression artifacts removal: both the thin black circle around the “wheel” and the gray part in the center of the “wheel” should be preserved. The challenge lies on two facts. On one hand, if we perform edge-preserving smoothing, the gray part will be removed because the corresponding edge is weak. Fig. 5(d) shows the result of the SD filter (?). The SD filter can properly preserve the thin black circle and sharpen the edges thanks to the adopted Welsch’s penalty function, however, it fails to preserve the weak edge between the black part and the gray part. On the other hand, if we adopt structure-preserving smoothing, then the thin black circle will be smoothed due to its small structure size. Fig. 5(e) shows the corresponding result of our method with the structure-preserving parameter setting described above. In contrast, our method with the edge-preserving and structure-preserving parameter setting can preserve both these two parts and sharpen the edges, as shown in Fig. 5(f). Fig. 3(e) and (f) also show a comparison of the SD filter and our method with 1D smoothing results. We fix $\alpha=0.5,r_{d}=r_{s},N=10$ for the tasks in both the second and the third groups. We empirically set $b_{d}=b_{s}=0.05I_{m}\sim 0.2I_{m}$ and $r_{d}=r_{s}=1\sim 5$ depending on the applied task and the input noise level.

The structure inconsistency issue in the third group can also be easily handled by our model. Note that $\mu_{i,j}^{s}$ in Eq. (19) is computed with the smoothed image in each iteration, as formulated in Eq. (18), it thus can reflect the inherent natures of the smoothed image. The guidance weight $\omega_{i,j}$ can provide additional structural information from the guidance image $g$ . This means that $\mu_{i,j}^{s}$ and $\omega_{i,j}$ can complement each other. In fact, the equivalent guidance weight of Eq. (19) in each iteration is $\mu_{i,j}^{s}\omega_{i,j}$ , which can reflect the property of both the smoothed image and the guidance image. In this way, it can properly handle the structure inconsistency problem and avoid blurring edges and texture copy artifacts. Similar ideas were also adopted in (?; ?).

Applications and Experimental Results

Our method is applied to various tasks in the first to the fourth groups to validate the effectiveness. Comparisons with the state-of-the-art approaches in each application are also presented. Due to the limited space, we only show experimental results of four applications.

Our experiments are performed on a PC with an Intel Core i5 3.4GHz CPU (one thread used) and 8GB memory. For an RGB image of size $800\times 600$ and $N=10$ in Algorithm 1, the running time is $10.04/25.09/43.11/69.82/96.73$ seconds in MATLAB for $r_{d}=r_{s}=1/2/3/4/5$ . Note that as described in the property analysis section, the value of $r_{d}=r_{s}$ is smaller than 3 in most cases except for guided depth map upsampling. For the tasks in the first group which require $N=1$ , the computational cost could be further reduced to $\frac{1}{10}$ of that mentioned above.

HDR tone mapping is a representative task in the first group. It requires to decompose the input image into a base layer and a detail layer through edge-preserving smoothing. The challenge of this task is that if the edges are sharpened by the smoothing procedure, it will result in gradient reversals, and halos will occur if the edges are blurred. Fig. 7 shows the tone mapping results using different edge-preserving smoothing operators. The results of BF (?) and GF (?) contain clear halos around the picture frames and the light fixture, as shown in Fig. 7(a) and (b). This is due to their local smoothing natures where strong smoothing can also blur salient edges (?; ?). The $L_{0}$ norm smoothing (?) can properly eliminate halos, but there are gradient reversals in its result as illustrated in Fig. 7(c). This is because the $L_{0}$ smoothing is prone to sharpen salient edges. The WLS (?) and SG-WLS (?) smoothing perform well in handling gradient reversals and halos in most cases. However, there are slight halos in their results as illustrated in the left close-up in Fig. 7(d) and (e). These artifacts are properly eliminated in our results.

Clip-art compression artifacts removal. Clip-art images are piecewise constant with sharp edges. When they are compressed in JPEG format with low quality, there will be edge-related artifacts, and the edges are usually blurred as shown in Fig. 8(a). Therefore, when removing the compression artifacts, the edges should also be sharpened in the restored image. We thus classify this task into the second group. The approach proposed by Wang et al. (?) can seldom handle heavy compression artifacts as shown in Fig. 8(b). The $L_{0}$ norm smoothing fails to preserve weak edges as shown in Fig. 8(c). The region fusion approach (?) is able to produce results with sharpened edges, however, it also enhances the blocky artifacts along strong edges as highlighted in Fig. 8(d). The edges in the result of BTF (?) are blurred in Fig. 8(e). Our result is illustrated in Fig. 8(f) with edges sharpened and compression artifacts removed.

Guided depth map upsampling belongs to the guided image filtering in the third group. The RGB guided image can provide additional structural information to restore and sharpen the depth edges. The challenge of this task is the structure inconsistency between the depth map and the RGB guidance image, which can cause blurring depth edges and texture copy artifacts in the upsampled depth map. We test our method on the simulated dateset provided in (?). Fig. 9 shows the visual comparison between our result and the results of the recent state-of-the-art approaches. Our method shows better performance in preserving sharp depth edges and avoiding texture copy artifacts. Tab. 1 also shows the quantitative evaluation on the results of different methods. Following the measurement used in (?; ?; ?; ?), the evaluation is measured in terms of mean absolute errors (MAE). As Tab. 1 shows, our method can achieve the best or the second best performance among all the compared approaches.

We further validate our method on the real data introduced by Ferstl et al. (?). The real dataset contains three low-resolution depth maps captured by a ToF depth camera and the corresponding highly accurate ground-truth depth maps captured with structured light. The upsampling factor for the real dataset is $\sim 6.25\times$ . The visual comparison in Fig. 10 and the quantitative comparison in Tab. 2 shows that our method can outperform the compared methods and achieve state-of-the-art performance.

Image texture removal belongs to the tasks in the fourth group. It aims at extracting salient meaningful structures while removing small complex texture patterns. The challenge of this task is that it requires structure-preserving smoothing rather than the edge-preserving in the above tasks. Fig. 11(a) shows a classical example of image texture removal: the small textures with strong edges should be smoothed out while the salient structures with weak edges should be preserved. Fig. 11(b) $\sim$ (f) show the results of the recent state-of-the-art approaches. The joint convolutional analysis and synthesis sparse (JCAS) model (?) can well remove the textures, but the resulting edges are also blurred. The RTV method (?), muGIF (?), BTF (?) and FCN based approach (?) cannot completely remove the textures, in addition, the weak edges of the salient structures have also been smoothed out in their results. Our method can both preserve the weak edges of the salient structures and remove the small textures.

Conclusion

We propose a non-convex non-smooth optimization framework for edge-preserving and structure-preserving image smoothing. We first introduce the truncated Huber penalty function which shows strong flexibility. Then a robust framework is presented. When combined with the flexibility of the truncated Huber penalty function, our framework is able to achieve different and even contradictive smoothing behaviors using different parameter settings. This is different from most previous approaches of which the inherent smoothing natures are usually fixed. We further propose an efficient numerical solution to our model and prove its convergence theoretically. Comprehensive experimental results in a number of applications demonstrate the effectiveness of our method.

**Acknowledgement

**We gratefully acknowledge the support of the Australia Centre for Robotic Vision. This paper is also partly supported by NSFC, China (No. U1803261, 61977046), Key Research and Development Program of Sichuan Province (No. 2019YFG0409) and National Key Research and Development Project (No. 2018AAA0100702)

Bibliography46

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[Barron and Poole 2016] Barron, J. T., and Poole, B. 2016. The fast bilateral solver. In ECCV , 617–632. Springer.
2[Buades et al . 2010] Buades, A.; Le, T. M.; Morel, J.-M.; Vese, L. A.; et al. 2010. Fast cartoon+ texture image filters. TIP 19(8):1978–1986.
3[Chan and Esedoglu 2005] Chan, T. F., and Esedoglu, S. 2005. Aspects of total variation regularized l 1 function approximation. SIAM Journal on Applied Mathematics 65(5):1817–1837.
4[Chen, Xu, and Koltun 2017] Chen, Q.; Xu, J.; and Koltun, V. 2017. Fast image processing with fully-convolutional networks. In ICCV , volume 9, 2516–2525.
5[Cho et al . 2014] Cho, H.; Lee, H.; Kang, H.; and Lee, S. 2014. Bilateral texture filtering. To G 33(4):128.
6[Durand and Dorsey 2002] Durand, F., and Dorsey, J. 2002. Fast bilateral filtering for the display of high-dynamic-range images. In To G , volume 21, 257–266. ACM.
7[Fan et al . 2018] Fan, Q.; Yang, J.; Wipf, D.; Chen, B.; and Tong, X. 2018. Image smoothing via unsupervised learning. In SIGGRAPH Asia 2018 Technical Papers , 259. ACM.
8[Fan et al . 2019] Fan, Q.; Chen, D.; Yuan, L.; Hua, G.; Yu, N.; and Chen, B. 2019. A general decoupled learning framework for parameterized image operators. IEEE transactions on pattern analysis and machine intelligence .