Feature Graph Learning for 3D Point Cloud Denoising
Wei Hu, Xiang Gao, Gene Cheung, Zongming Guo

TL;DR
This paper introduces a novel feature graph learning method that optimizes a Mahalanobis distance-based graph kernel using a single observation, significantly improving 3D point cloud denoising performance.
Contribution
It proposes an efficient optimization algorithm for learning feature graphs from minimal data, with applications to 3D point cloud denoising, achieving state-of-the-art results.
Findings
Achieves superior denoising performance compared to existing methods.
Develops a fast, eigen-decomposition-free optimization algorithm.
Demonstrates effectiveness with high-dimensional feature vectors.
Abstract
Identifying an appropriate underlying graph kernel that reflects pairwise similarities is critical in many recent graph spectral signal restoration schemes, including image denoising, dequantization, and contrast enhancement. Existing graph learning algorithms compute the most likely entries of a properly defined graph Laplacian matrix , but require a large number of signal observations 's for a stable estimate. In this work, we assume instead the availability of a relevant feature vector per node , from which we compute an optimal feature graph via optimization of a feature metric. Specifically, we alternately optimize the diagonal and off-diagonal entries of a Mahalanobis distance matrix by minimizing the graph Laplacian regularizer (GLR) , where edge weight is $w_{i,j} =…
| Abbreviation | Description |
|---|---|
| GLR | Graph Laplacian Regularizer |
| PD | Positive Definite |
| PSD | Positive Semi-Definite |
| PG | Proximal Gradient |
| IGMRF | Intrinsic Gaussian Markov Random Fields |
| MAP | Maximum a Posteriori |
| MLS | Moving Least Squares |
| RIMLS | Robust Implicit MLS |
| APSS | Algebraic Point Set Surfaces |
| LOP | Locally Optimal Projection |
| WLOP | Weighted LOP |
| AWLOP | Anisotropic WLOP |
| NLD | Non-Local Denoising |
| LR | Low Rank |
| Anchor | Daratech | DC | Gargoyle | Quasimoto | |
| MSE | |||||
| Ours | 0.240 | 0.301 | 0.223 | 0.256 | 0.197 |
| Diagonal | 0.251 | 0.339 | 0.241 | 0.275 | 0.205 |
| SNR | |||||
| Ours | 48.17 | 43.78 | 47.04 | 46.93 | 47.87 |
| Diagonal | 47.72 | 42.59 | 46.23 | 46.20 | 47.50 |
| Model | Noisy | APSS | RIMLS | AWLOP | NLD | MRPCA | LR | GLR | Diagonal | Baseline1 | Baseline2 | Ours |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Anchor | 0.259 | 0.208 | 0.212 | 0.237 | 0.231 | 0.202 | 0.228 | 0.189 | 0.199 | 0.197 | 0.198 | 0.194 |
| Daratech | 0.245 | 0.203 | 0.209 | 0.228 | 0.222 | 0.225 | 0.213 | 0.197 | 0.198 | 0.196 | 0.195 | 0.192 |
| DC | 0.237 | 0.186 | 0.198 | 0.211 | 0.206 | 0.189 | 0.206 | 0.177 | 0.180 | 0.180 | 0.180 | 0.177 |
| Gargoyle | 0.257 | 0.208 | 0.217 | 0.230 | 0.230 | 0.215 | 0.240 | 0.202 | 0.205 | 0.204 | 0.204 | 0.200 |
| Quasimoto | 0.224 | 0.171 | 0.183 | 0.196 | 0.190 | 0.171 | 0.180 | 0.162 | 0.162 | 0.163 | 0.162 | 0.161 |
| Average | 0.244 | 0.195 | 0.203 | 0.220 | 0.215 | 0.200 | 0.213 | 0.185 | 0.189 | 0.188 | 0.188 | 0.184 |
| Anchor | 0.322 | 0.239 | 0.244 | 0.259 | 0.265 | 0.230 | 0.246 | 0.217 | 0.225 | 0.227 | 0.225 | 0.221 |
| Daratech | 0.303 | 0.242 | 0.258 | 0.298 | 0.258 | 0.259 | 0.252 | 0.238 | 0.243 | 0.244 | 0.244 | 0.236 |
| DC | 0.293 | 0.210 | 0.226 | 0.257 | 0.235 | 0.211 | 0.221 | 0.203 | 0.205 | 0.207 | 0.204 | 0.200 |
| Gargoyle | 0.318 | 0.239 | 0.252 | 0.294 | 0.262 | 0.241 | 0.257 | 0.233 | 0.233 | 0.237 | 0.233 | 0.228 |
| Quasimoto | 0.274 | 0.188 | 0.203 | 0.226 | 0.217 | 0.187 | 0.193 | 0.176 | 0.179 | 0.181 | 0.179 | 0.175 |
| Average | 0.302 | 0.223 | 0.236 | 0.266 | 0.247 | 0.225 | 0.233 | 0.213 | 0.217 | 0.219 | 0.217 | 0.212 |
| Anchor | 0.372 | 0.254 | 0.263 | 0.306 | 0.297 | 0.242 | 0.259 | 0.228 | 0.236 | 0.243 | 0.236 | 0.232 |
| Daratech | 0.348 | 0.282 | 0.308 | 0.286 | 0.295 | 0.288 | 0.283 | 0.276 | 0.280 | 0.286 | 0.284 | 0.274 |
| DC | 0.338 | 0.227 | 0.254 | 0.270 | 0.269 | 0.223 | 0.234 | 0.228 | 0.223 | 0.225 | 0.219 | 0.215 |
| Gargoyle | 0.368 | 0.262 | 0.277 | 0.297 | 0.294 | 0.257 | 0.269 | 0.257 | 0.253 | 0.259 | 0.253 | 0.245 |
| Quasimoto | 0.318 | 0.201 | 0.219 | 0.218 | 0.252 | 0.199 | 0.204 | 0.187 | 0.195 | 0.195 | 0.191 | 0.182 |
| Average | 0.348 | 0.245 | 0.264 | 0.275 | 0.281 | 0.241 | 0.249 | 0.235 | 0.237 | 0.242 | 0.237 | 0.229 |
| Anchor | 0.417 | 0.267 | 0.281 | 0.315 | 0.331 | 0.253 | 0.270 | 0.244 | 0.246 | 0.248 | 0.246 | 0.240 |
| Daratech | 0.387 | 0.350 | 0.373 | 0.359 | 0.330 | 0.325 | 0.347 | 0.308 | 0.319 | 0.326 | 0.322 | 0.301 |
| DC | 0.381 | 0.251 | 0.265 | 0.324 | 0.306 | 0.239 | 0.247 | 0.241 | 0.231 | 0.247 | 0.245 | 0.222 |
| Gargoyle | 0.412 | 0.292 | 0.305 | 0.365 | 0.334 | 0.277 | 0.281 | 0.273 | 0.266 | 0.274 | 0.268 | 0.256 |
| Quasimoto | 0.362 | 0.229 | 0.242 | 0.267 | 0.291 | 0.207 | 0.218 | 0.209 | 0.195 | 0.203 | 0.196 | 0.193 |
| Average | 0.392 | 0.278 | 0.293 | 0.326 | 0.318 | 0.260 | 0.273 | 0.255 | 0.251 | 0.260 | 0.255 | 0.242 |
| Anchor | 0.631 | 0.389 | 0.402 | 0.536 | 0.571 | 0.382 | 0.398 | 0.407 | 0.352 | 0.360 | 0.350 | 0.333 |
| Daratech | 0.533 | 0.504 | 0.542 | 0.466 | 0.508 | 0.445 | 0.431 | 0.446 | 0.404 | 0.404 | 0.402 | 0.398 |
| DC | 0.575 | 0.403 | 0.464 | 0.502 | 0.529 | 0.405 | 0.402 | 0.389 | 0.378 | 0.390 | 0.385 | 0.368 |
| Gargoyle | 0.619 | 0.444 | 0.475 | 0.535 | 0.564 | 0.423 | 0.428 | 0.438 | 0.419 | 0.428 | 0.426 | 0.416 |
| Quasimoto | 0.561 | 0.402 | 0.414 | 0.473 | 0.523 | 0.388 | 0.387 | 0.356 | 0.293 | 0.312 | 0.304 | 0.286 |
| Average | 0.584 | 0.428 | 0.459 | 0.502 | 0.539 | 0.409 | 0.409 | 0.407 | 0.369 | 0.379 | 0.373 | 0.360 |
| Model | Noisy | APSS | RIMLS | AWLOP | NLD | MRPCA | LR | GLR | Diagonal | Baseline1 | Baseline2 | Ours |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Anchor | 47.41 | 49.61 | 49.41 | 48.31 | 48.53 | 49.88 | 48.69 | 50.55 | 50.03 | 50.14 | 50.11 | 50.30 |
| Daratech | 45.85 | 47.71 | 47.44 | 46.56 | 46.82 | 46.72 | 47.27 | 48.02 | 47.96 | 48.08 | 48.12 | 48.29 |
| DC | 46.42 | 48.83 | 48.23 | 47.59 | 47.82 | 48.68 | 47.83 | 49.34 | 49.19 | 49.16 | 49.16 | 49.33 |
| Gargoyle | 46.91 | 49.01 | 48.57 | 48.01 | 48.01 | 48.66 | 47.57 | 49.30 | 49.18 | 49.20 | 49.19 | 49.40 |
| Quasimoto | 46.61 | 49.27 | 48.60 | 47.92 | 48.22 | 49.27 | 48.78 | 49.81 | 49.85 | 49.77 | 49.82 | 49.90 |
| Average | 46.67 | 48.88 | 48.44 | 47.67 | 47.87 | 48.64 | 48.02 | 49.40 | 49.24 | 49.27 | 49.28 | 49.44 |
| Anchor | 45.25 | 48.24 | 48.00 | 46.69 | 47.16 | 48.60 | 47.91 | 49.20 | 48.82 | 48.72 | 48.83 | 48.99 |
| Daratech | 43.70 | 46.00 | 45.46 | 45.12 | 45.34 | 45.18 | 45.59 | 46.13 | 45.92 | 45.89 | 45.91 | 46.22 |
| DC | 44.32 | 47.64 | 46.94 | 46.04 | 46.49 | 47.62 | 47.10 | 47.94 | 47.84 | 47.76 | 47.90 | 48.11 |
| Gargoyle | 44.75 | 47.63 | 47.12 | 46.39 | 46.68 | 47.52 | 46.88 | 47.87 | 47.89 | 47.70 | 47.86 | 48.09 |
| Quasimoto | 44.58 | 48.34 | 47.57 | 46.53 | 46.89 | 48.40 | 48.09 | 49.00 | 48.83 | 48.72 | 48.82 | 49.06 |
| Average | 44.52 | 47.57 | 47.01 | 46.15 | 46.51 | 47.46 | 47.11 | 48.02 | 47.86 | 47.76 | 47.86 | 48.09 |
| Anchor | 43.78 | 47.60 | 47.27 | 45.74 | 46.02 | 48.09 | 47.41 | 48.67 | 48.34 | 48.04 | 48.35 | 48.51 |
| Daratech | 42.34 | 44.46 | 43.58 | 44.32 | 43.98 | 44.25 | 44.41 | 44.64 | 44.53 | 44.30 | 44.36 | 44.73 |
| DC | 42.86 | 46.84 | 45.71 | 45.11 | 45.15 | 47.00 | 46.54 | 46.80 | 47.03 | 46.93 | 47.21 | 47.38 |
| Gargoyle | 43.31 | 46.69 | 46.14 | 45.44 | 45.53 | 46.88 | 46.44 | 46.89 | 47.05 | 46.81 | 47.06 | 47.37 |
| Quasimoto | 43.09 | 47.68 | 46.80 | 46.85 | 45.40 | 47.80 | 47.52 | 48.40 | 48.00 | 47.97 | 48.20 | 48.67 |
| Average | 43.07 | 46.65 | 45.90 | 45.49 | 45.21 | 46.80 | 46.46 | 47.08 | 46.99 | 46.81 | 47.04 | 47.33 |
| Anchor | 42.65 | 47.09 | 46.59 | 45.44 | 44.94 | 47.64 | 46.99 | 48.00 | 47.90 | 47.80 | 47.92 | 48.17 |
| Daratech | 41.28 | 42.29 | 41.64 | 42.02 | 42.87 | 43.03 | 42.37 | 43.56 | 43.21 | 42.99 | 43.11 | 43.80 |
| DC | 41.68 | 45.86 | 45.30 | 43.27 | 43.85 | 46.33 | 46.01 | 46.24 | 46.68 | 45.98 | 46.06 | 47.07 |
| Gargoyle | 42.17 | 45.61 | 45.18 | 43.37 | 44.28 | 46.12 | 45.99 | 46.28 | 46.56 | 46.25 | 46.45 | 46.93 |
| Quasimoto | 41.79 | 46.36 | 45.83 | 44.83 | 43.99 | 47.39 | 46.85 | 47.28 | 47.96 | 47.55 | 47.94 | 48.08 |
| Average | 41.91 | 45.44 | 44.91 | 43.79 | 43.99 | 46.10 | 45.64 | 46.27 | 46.46 | 46.11 | 46.30 | 46.81 |
| Anchor | 38.52 | 43.35 | 43.04 | 40.13 | 39.51 | 43.52 | 43.12 | 42.89 | 44.33 | 44.10 | 44.38 | 44.87 |
| Daratech | 38.10 | 38.66 | 37.92 | 39.43 | 38.57 | 39.88 | 40.24 | 39.86 | 40.84 | 40.82 | 40.88 | 41.00 |
| DC | 37.57 | 41.14 | 39.72 | 38.91 | 38.38 | 41.07 | 41.13 | 41.45 | 41.71 | 41.39 | 41.54 | 42.00 |
| Gargoyle | 38.12 | 41.44 | 40.78 | 39.57 | 39.03 | 41.91 | 41.79 | 41.56 | 41.97 | 41.76 | 41.82 | 42.06 |
| Quasimoto | 37.43 | 40.78 | 40.49 | 39.13 | 38.13 | 41.11 | 41.13 | 41.96 | 43.88 | 43.26 | 43.53 | 44.14 |
| Average | 37.95 | 41.07 | 40.39 | 39.43 | 38.72 | 41.50 | 41.48 | 41.54 | 42.55 | 42.27 | 42.43 | 42.81 |
| Anchor | Daratech | DC | Gargoyle | Quasimoto | |
|---|---|---|---|---|---|
| Vanilla PG | 0.240 | 0.301 | 0.222 | 0.256 | 0.191 |
| Ours | 0.240 | 0.301 | 0.222 | 0.256 | 0.193 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
Feature Graph Learning for 3D Point Cloud Denoising
Wei Hu, Xiang Gao, Gene Cheung, and Zongming Guo W. Hu (e-mail: [email protected]), X. Gao (e-mail: [email protected]), and Z. Guo (e-mail: [email protected]) are with Wangxuan Institute of Computer Technology, Peking University, No. 128 Zhongguancun North Street, Beijing, China. G. Cheung (e-mail: [email protected]) is with York University, 4700 Keele Street Toronto, Ontario, Canada.
Abstract
Identifying an appropriate underlying graph kernel that reflects pairwise similarities is critical in many recent graph spectral signal restoration schemes, including image denoising, dequantization, and contrast enhancement. Existing graph learning algorithms compute the most likely entries of a properly defined graph Laplacian matrix , but require a large number of signal observations ’s for a stable estimate. In this work, we assume instead the availability of a relevant feature vector per node , from which we compute an optimal feature graph via optimization of a feature metric. Specifically, we alternately optimize the diagonal and off-diagonal entries of a Mahalanobis distance matrix by minimizing the graph Laplacian regularizer (GLR) , where edge weight is , given a single observation . We optimize diagonal entries via proximal gradient (PG), where we constrain to be positive definite (PD) via linear inequalities derived from the Gershgorin circle theorem. To optimize off-diagonal entries, we design a block descent algorithm that iteratively optimizes one row and column of . To keep PD, we constrain the Schur complement of sub-matrix of to be PD when optimizing via PG. Our algorithm mitigates full eigen-decomposition of , thus ensuring fast computation speed even when feature vector has high dimension. To validate its usefulness, we apply our feature graph learning algorithm to the problem of 3D point cloud denoising, resulting in state-of-the-art performance compared to competing schemes in extensive experiments.
Index Terms:
Graph learning, Mahalanobis distance, graph Laplacian regularizer, 3D Point cloud denoising
I Introduction
Graphs are flexible mathematical structures modeling pairwise relations between data entities, such as brain networks, social networks, computer networks and transportation networks. Nodes in a graph represent data collecting entities (e.g., users in a social group or sensors in a wireless network) while edges connecting nodes describe pairwise affinities. A scalar (weight) is often assigned to each edge, which reflects the degree of pairwise similarity between two nodes. In settings where the graph is not readily available, it is critical to first identify an appropriate underlying graph kernel—a process commonly called graph learning [1, 2, 3]—before it is used for many recent graph spectral signal restoration schemes, including image denoising, dequantization, deblurring, and contrast enhancement [4, 5, 6, 7].
Existing graph learning methods111We focus on learning of undirected graphs when one or fewer signal observation is available in this paper, while learning of directed graphs [3] is left for future work. can be roughly classified into two main categories: statistical methods and graph spectral methods. The common assumption behind statistical methods is that multiple data observations generated from the same probability model are available to determine the model parameters, which describe an underlying graph. Graph learning is then essentially the problem of estimating the inverse covariance or precision matrix given sufficient empirical data, with the addition of some prior topological information (e.g., sparsity) [8, 9, 10, 11, 12, 13, 14, 15, 16, 17]. However, the requirement of multiple signal observations is not applicable to scenarios where only a single observed signal is available.
On the other hand, graph spectral learning methods offer an alternative (or additional) signal representation perspective, where observed signals are assumed to lie in a low-dimensional subspace spanned by the low frequency components of the underlying graph topologies [18, 19, 20, 21, 22, 23, 24, 25]. Specifically, the frequency components are eigenvectors of a chosen variational operator on graphs like the adjacency matrix or the graph Laplacian matrix [26]. This “low-pass” spectral assumption translates to additional constraints during graph learning, potentially leading to more accurate graph estimates when there are few signal observations.
Extending on these previous works [21, 24, 27], in this paper we study spectral graph learning when the number of signal observations is extremely small—just one observation or even fewer (i.e., partial observation of one signal). This is typically the case for image restoration applications with non-stationary statistics [4, 5, 6, 7], where the underlying graph for a target image patch needs to be estimated for graph spectral processing given just one noisy and/or partial patch observation. To ease the ill-posedness of the problem, we assume the availability of a relevant feature vector per node , (e.g., the color pixel intensities), and that an edge weight is an inverse function of the feature distance (i.e., larger the inter-node distance, smaller the edge weight). Many previous graph constructions including bilateral filter [28, 29, 30, 4, 27] implicitly assume some notion of feature distance when assigning edge weights; our work is a more formal study of feature metric learning in a rigorous mathematical setting.
Specifically, we assume an edge weight , where is the Mahalanobis distance metric matrix [31] given feature difference for the two connected nodes and . Given a single observation , we seek to minimize the Graph Laplacian Regularizer (GLR) [4] using , which measures the smoothness of the signal with respect to the graph Laplacian matrix . Note that because the feature dimension is often much smaller than signal dimension , variable has entries only, while in general Laplacian matrix has entries, resulting in more stable parameter estimation given the same observed data.
To find the optimal , we alternately optimize the diagonal and off-diagonal entries. When optimizing diagonal entries, we derive linear inequalities based on the Gershgorin circle theorem (GCT) [32] to keep positive definite (PD) and then employ proximal gradient (PG) descent [33] to acquire a solution. When optimizing off-diagonal entries, we design a block descent algorithm that iteratively optimizes one row and column of until convergence. To keep PD, we constrain the Schur complement of to be PD according to the Haynsworth inertia additivity [34], which is enforced using PG descent. Further, we relax the PD constraint via vector-norm bounding to avoid matrix inverse computation during the optimization. Our algorithm also mitigates full eigen-decomposition of , thus ensuring fast computation speed even when the feature dimension is large.
To validate the usefulness of our proposed feature graph learning algorithm, we apply it to the problem of 3D point cloud denoising. Point clouds provide efficient representation for arbitrarily-shaped objects, which consist of a set of irregularly-spaced points. The maturity of depth sensing and 3D laser scanning techniques222Commercial products include Microsoft Kinect, LiDAR, Intel RealSense, etc. enables convenient acquisition of 3D point clouds, which have a variety of applications such as 3D immersive tele-presence, navigation for autonomous vehicles, free-viewpoint rendering, and heritage reconstruction [35]. However, point clouds are often perturbed by noise, which comes from hardware, software or other causes.
To denoise point clouds, we first assume that local patches are self-similar [36, 37] and connect corresponding points into a graph. Assuming a first-order intrinsic Gaussian Markov random field (IGMRF) model [38], we pose a Maximum a Posteriori (MAP) estimation problem with GLR as signal prior. Interpreting the precision matrix in the IGMRF model as a graph Laplacian, we employ the feature graph learning to optimize edge weights, where for each point we employ 3D coordinates and surface normals as relevant features. Finally, we optimize the point cloud and the underlying graph alternately until convergence. Extensive experiments show that we achieve state-of-the-art performance compared to competing methods [39, 40, 41].
To summarize, the main contributions of our works are:
To identify an appropriate underlying graph given a single signal observation , we formulate a feature graph learning problem by minimizing the GLR using the Mahalanobis distance metric matrix as variable, assuming feature vector per node is available. 2. 2.
We develop a fast block descent algorithm to optimize the feature metric matrix , while keeping positive definite and mitigating full matrix eigen-decomposition and large matrix inverse; 3. 3.
We employ feature graph learning to 3D point cloud denoising, where the graph for each set of self-similar patches is computed from 3D coordinates and surface normals as features, resulting in superior denoising performance.
The paper is organized as follows. We first review previous works on graph learning and point cloud denoising in Section II. Then we introduce basic concepts in graph spectral processing in Section III. In Section IV, we describe the proposed problem formulation and algorithm development for feature graph learning. We then apply to the problem of point cloud denoising in Section V. Finally, experimental results and conclusions are presented in Section VI and VII, respectively.
II Related Work
We overview previous works on graph learning and point cloud denoising in order.
II-A Graph Learning
Previous graph learning methods can be divided into two main categories: statistical methods and graph spectral methods.
Statistical methods: In graphical models including Markov random fields [38] and Bayesian networks [42], edges in the graph encode conditional dependencies among random variables represented as nodes. Learning the graph structure amounts to learning the inverse covariance or precision matrix for such models. Dempster [11] proposed to introduce zero entries in inverse covariance matrices for simplified covariance estimation. The estimation of a sparse inverse covariance matrix was then studied in several works [12, 13, 8]. Friedman et al. formulated sparse inverse covariance estimation with a regularization framework and developed the Graphical Lasso algorithm to address the regularized optimization problem [10]. Some algorithmic extensions of the Graphical Lasso are presented in [9, 14], and a few computationally efficient variations are discussed in [15, 16, 17]. However, inverse covariance estimation methods assume many observations of a graphical model, which is not applicable for many imaging applications.
Graph spectral methods: The key idea is to enforce low frequency representation of observed signals as well as constraints for a valid graph Laplacian matrix. Tenenbaum et al. [18] proposed to learn combinatorial graph Laplacians using a proposed sparse model. A regression framework was presented in [43] to learn a graph Laplacian matrix based on a fitness metric between the signals and the graph, which essentially evaluates the smoothness of the signals on the graph. Dong et al. [21] and Kalofolias [19] proposed to learn Laplacian matrices from the smoothness prior of the graph signal. Egilmez et al. [44] proposed graph learning under pre-defined graph structural and graph Laplacian constraints. Yang et al. [27] computed optimal feature weights in a similarity graph given a restored binary classifier signal. This is an earlier version of our feature metric learning, but restricts the search space only to diagonal matrices, which limits its effectiveness.
Orthogonally, some studies focus on inferring graph topologies from signals that are diffused on a graph over time. A fitness metric similar to the regression framework was employed in [45] to learn a valid graph topology. In particular, Segarra et al. [22] and Pasdeloup et al. [23] focused on learning graph shift/diffusion operators (such as adjacency matrices) from a set of diffused graph signals. Sardellitti et al. [20] proposed to learn the graph topology from data under the assumption of band-limited signals, which corresponded to signals with clustering properties. Nonetheless, this class of methods also assume a large number of signal observations for a stable estimate.
II-B Point Cloud Denoising
Point cloud denoising methods mainly include Moving Least Squares (MLS) based methods, Locally Optimal Projection (LOP) based methods, sparsity based methods, non-local based methods and graph-based methods.
MLS-based methods: MLS-based methods approximate the point cloud with a smooth surface and then project the points of the point cloud onto the fitted surface. [46] used the MLS projection operator to calculate the optimal MLS surface of the point cloud, and moved the points around the surface to the MLS surface. [47] proposed a MLS-based spherical fitting denoising method (APSS). Compared with the aforementioned MLS projection-based algorithm, this method improved the stability at low sampling rate and high curvature. [48] proposed an algorithm based on improved MLS and local kernel regression to smooth the point cloud surface (RIMLS). However, these MLS-based methods are often sensitive to outliers.
LOP-based methods: The widely known LOP [49] aimed to produce a set of points to represent the underlying surface while enforcing a uniform distribution over the point cloud. Weighted LOP (WLOP) [50] provided a more uniformly distributed output than LOP by adapting a repulse term to the local density. Further, anisotropic WLOP (AWLOP) [51] modified WLOP with an anisotropic weighting function in order to preserve sharp features better. Nevertheless, LOP-based methods often suffer from over-smoothing.
Sparsity based methods: These methods are based on sparse representation theory [52], and generally involve two phases. In the first phase, the sparse reconstruction of the cloud normals is obtained by solving the global minimization problem of sparse regularization. In [53] regularization was adopted, while [54] used regularization to seek more characteristic sparsity. In the second phase, each point position is updated by solving global (or ) minimization problem based on the reconstructed normals and local planarity hypothesis. The recently proposed method called Moving Robust Principal Components Analysis (MRPCA) [39] used weighted minimization of the point deviations from the local reference plane to preserve sharp features. However, when the noise level is high, over-smoothing or over-sharpening tends to occur [54].
Non-local based methods: These approaches exploit the non-local similarities among patches in a point cloud. In [55], an extended non-local denoising (NLD) algorithm was introduced to process point clouds, where the neighborhood of each point was described by the polynomial coefficients of the local MLS surface to compute point similarity. [56] and [57] applied a scale space scheme and non-local means denoising algorithm. [37] extended the BM3D [58] algorithm to point cloud denoising, searched similar patches globally via Iterative Closest Point (ICP) [59] algorithm, and then combined them into a collaborative group for denoising. [40] utilized patch self-similarity and optimized for a low rank (LR) dictionary representation of the extracted patches to smooth 3D patches. However, the computational complexity of such methods is often high due to the global search.
Graph-based methods: This class of methods interpret a point cloud as a signal on a graph, and perform denoising via chosen graph filters. In [60], the input point cloud was represented as a signal on a -nearest-neighbor graph and then denoised via a convex optimization problem regularized by the gradient of the point cloud on the graph. In [61], a reweighted graph Laplacian regularizer for surface normals was designed, with a general -norm fidelity term that modeled two types of additive noise. Moreover, they established a linear relationship between normals and 3D point coordinates via bipartite graph approximation for ease of optimization. [41] proposed graph Laplacian regularization (GLR) of a low dimensional manifold model (LDMM), and sought self-similar patches to denoise them simultaneously. Instead of directly smoothing the coordinates or normals of 3D points, [62] estimated a local tangent plane at each 3D point based on a graph and then reconstructed each 3D point by weighted averaging of its projections on multiple tangent planes.
Our proposed approach belongs to the family of graph-based methods. The key difference is that edge weights in our graph are not pre-defined with hand-crafted parameters, but optimized rigorously via feature metric learning given available signal(s) assumed to be smooth with respect to the graph.
III Background on Spectral Graph Theory
We first review basic concepts in spectral graph theory [26] that are essential in our feature graph learning and point cloud denoising algorithms.
III-A Graph and Graph Laplacian
We consider an undirected graph composed of a node set of cardinality , an edge set connecting nodes, and a weighted adjacency matrix . Each edge is associated with a non-negative weight which reflects the degree of similarity between nodes and .
Among different variants of Laplacian matrices, in this paper we employ the combinatorial graph Laplacian [63, 64, 30] defined as , where is the degree matrix—a diagonal matrix where .
III-B Graph Laplacian Regularizer
Graph signal refers to data that resides on the nodes of a graph, such as functionality of regions on a neural network and temperatures on a sensor network.
A graph signal defined on a graph is smooth with respect to [65] if
[TABLE]
where is a small positive scalar. To satisfy (1), connected node pair and must be similar for a large edge weight ; for a small , and can differ significantly. Hence, (1) forces to adapt to the topology of , and is commonly called the graph Laplacian Regularizer (GLR) [29, 4]. This prior also has a frequency interpretation:
[TABLE]
where is the -th eigenvalue of and is commonly interpreted as the -th graph frequency, and is the inner-product between the corresponding -th eigenvector and signal . In other words, is the energy in the -th graph frequency for signal . Thus, a small means that most signal energies are in the low graph frequencies, or is roughly low-pass.
III-C Signal-Dependent Graph Laplacian Regularizer
In the aforementioned GLR, the graph Laplacian is fixed, which does not promote reconstruction of the target signal with discontinuities if the corresponding edge weights are not very small. It is thus extended to signal-dependent GLR in [5, 4, 6] by considering as a function of the graph signal . Specifically, an edge weight is an inverse function of the inter-node pixel intensity difference, e.g., . The reweighted prior is defined as
[TABLE]
where is the -th element of the corresponding adjacency matrix .
It has been shown in [6, 4, 5] that minimizing the signal-dependent GLR iteratively can promote piecewise smoothness (PWS) in the reconstructed graph signal , assuming that edge weights are appropriately initialized. In our more general setting, given a feature vector per node—which may include the signal intensity as one feature—our work can be considered a general case that includes signal-dependent GLR as a special case, where we compute the best feature graph via an optimization of the Mahalanobis distance metric.
IV Feature Metric Learning
IV-A Problem Formulation
Conceptually, an edge weight reflects the similarity between samples at nodes and ; specifically, using the commonly used Gaussian kernel [29], edge weight , where denotes the estimated feature distance between samples and . One advantage of the Gaussian kernel is that edge weight is in range , ensuring the resulting combinatorial graph Laplacian matrix to be positive semi-definite (PSD) [66].
The feature distance between two samples essentially measures the inter-sample similarity. As one well-known example of feature distance, consider the bilateral filter in image denoising [28] that employs pixel intensities and pixel locations as relevant features to compute , namely,
[TABLE]
where and are parameters. Defining , we can rewrite (4) in matrix form as:
[TABLE]
[28] shows that with appropriate parameters and , the bilateral filter can achieve very good edge-preserving image denoising performance. How to best determine and , however, was left unanswered.
More generally, associated with each sample is a length- vector of relevant features, and our goal is to compute an optimal Mahalanobis distance for the given features:
[TABLE]
where is a positive definite (PD) matrix333PD is desirable, so that if , then .. As a special case, when is a diagonal matrix with strictly positive diagonal entries, the definition in (8) defaults to that in [27]. Diagonal can capture the relative importance of individual features when computing , but fails to capture possible cross-correlation among features, and thus is sub-optimal in the general case.
IV-A1 Importance of Off-diagonal Terms in
For completeness, we illustrate the importance of off-diagonal terms via the following analysis and example. Fundamentally, a real symmetric matrix is normal and thus diagonalizable, i.e., it can be eigen-decomposed into the following form:
[TABLE]
where is a diagonal matrix with eigenvalues along its diagonal, and contains the corresponding eigenvectors as columns. Although the spectral theorem requires to be a unitary matrix, more generally, we can interpret (9) to mean that symmetric real matrix generalizes any diagonal matrix by pre- and post-multiplying it by any chosen square matrix and its transpose. To demonstrate the importance of this generalization, consider the following simple example. Define first the difference vector as the difference between feature vectors and , i.e., . Thus, given metric , the Mahalanobis distance between nodes and is computed as . Suppose now that there are only two available features for every node . Suppose also that the optimal metric in this case computes the difference of the two components in the difference vector , i.e.,
[TABLE]
In this case, two nodes and with difference vector having the same component will result in a Mahalanobis distance of , no matter how large is. On the other hand, any non-zero PSD diagonal matrix , where or , will lead to a distance as . Thus, we can conclude that a diagonal-only metric can be arbitrarily worse than the optimal metric with off-diagonal terms, and off-diagonal terms for metric are essential in computing feature distances.
To demonstrate that the above 2-feature example is not contrived, consider the following concrete application. Suppose the first and second features, and , measure the -location of a train on a line track at time [math] and time , respectively. Suppose the optimal metric considers only the difference in velocity, , of the two trains and , i.e.,
[TABLE]
which is the same as (10). Clearly, if the two trains and have the same velocity, the Mahalanobis distance between them should be zero, regardless of their difference in start / end locations. Using any diagonal-only metric, however, would compute a Mahalanobis distance that is a function of the difference between their start / end locations, which is incorrect.
IV-A2 Formulation
We can now pose an optimization problem for with GLR (3) as objective: we seek the optimal metric that yields the smallest GLR term given feature vector for each node , and one (or more) signal observation(s) . Specifically, denote by the inter-node sample difference square of observation , we have
[TABLE]
If there are more than one signal observations available, then in (13) can be easily generalized to be the sum of inter-node sample difference squares of all observations, i.e. .
Minimizing (13) directly would lead to one pathological solution, i.e., , resulting in edge weights . Topologically, this means nodes in the graph are all isolated, defeating the goal of finding a similarity graph. To avoid this solution, we constrain the trace of to be smaller than a constant parameter , resulting in
[TABLE]
One naïve approach to the optimization problem in (14) using proximal gradient descent [33] is as follows. The objective in (14) itself is convex and differentiable with respect to , and thus a gradient descent step can be computed. The constraints in (14) describe a feasible solution space that is a convex cone of all PD matrices with trace upper-bounded by . One can thus rewrite the constraints as a second objective term that evaluates to 0 if is in the convex set and otherwise. This convex but non-differentiable objective term has the following proximal mapping : orthogonally project the eigenvalues ’s of into the convex set: , . This results in a proximal gradient step: , where is a chosen step size.
However, this naïve realization of proximal gradient requires eigen-decomposition of per iteration with complexity , which is computation-expensive for large . To completely circumvent eigen-decomposition, we rewrite the PD cone constraint as a set of linear constraints, which form another convex set (a polytope) that is much easier to solve. In particular, our strategy is to optimize ’s diagonal and off-diagonal entries alternately until convergence. We discuss the two optimizations in order next.
IV-B Optimization of Diagonal Entries
When ’s diagonal entries are optimized, (14) can be simplified as follows. Let , . Further, let matrix be with only the off-diagonal entries, i.e., . Then,
[TABLE]
IV-B1 Gershgorin-based reformulation
To enforce the positive definiteness constraint using simple linear constraints, we leverage on the Gershgorin circle theorem (GCT) [32] and constrain each Gershgorin disc corresponding to each row of to reside in strictly positive territory. Specifically, disc has center and radius . To keep in positive territory, we need the left-end to be positive. (15) thus becomes:
[TABLE]
We can further simplify the objective by defining , resulting in
[TABLE]
IV-B2 Proximal Gradient algorithm
To solve (17) efficiently, we employ a proximal gradient (PG) approach [33]. Let . The linear constraints for form a convex set:
[TABLE]
Then, we define the indicator function :
[TABLE]
We now rewrite the optimization for as an unconstrained problem by exchanging the convex set constraint with indicator function in the objective:
[TABLE]
The first term is convex with respect to and differentiable, while the second term is convex but non-differentiable. we can thus employ PG to solve (20) as follows.
We first compute the gradient of the first term with respect to :
[TABLE]
We next define a proximal mapping for the second term—indicator function —which is a projection onto the convex set , i.e.,
[TABLE]
where , and is any positive root of [67].
Each iteration of the PG algorithm can be now written as:
[TABLE]
where is the step size. As discussed in [33], the algorithm will converge with rate for a fixed step size , where is a Lipschitz constant that requires computation of the Hessian of . In our experiment, we choose a small step size empirically, which is small enough to satisfy the Lipschitz smoothness of the objective function. In the first iteration, we initialize to be a diagonal matrix with each diagonal entry , thus ensuring is PD and .
Further, we may reduce the complexity of the proposed algorithm via accelerated proximal gradient (APG) [68, 69]. APG is able to accelerate the convergence, by first extrapolating a point from the current point and the previous point and then performing a proximal gradient step. We leave this as our future work.
IV-C Optimization of Off-diagonal Entries
For off-diagonal entries of , we develop a block coordinate descent algorithm, which optimizes one row / column at a time.
IV-C1 Block Coordinate Iteration
First, we divide into four sub-matrices:
[TABLE]
where , , and . The assumption that is symmetric means . Our strategy is to optimize one row and column of off-diagonal entries represented by in one iteration, given objective and constraints in (14). In the next iteration, a different row and column is selected, and with appropriate rows and columns reordering, the optimization variable can still reside in the first row and column as shown in (24).
By the Haynsworth inertia additivity [34], a symmetric real matrix is PD if and only if both its sub-matrix and its corresponding Schur complement are PD. Hence, we can ensure is PD by constraining the Schur complement to be positive, given that matrix , and therefore sub-matrix , are both PD from the previous iteration.
In the first iteration, we initialize to be a diagonal matrix with diagonal entries as optimized in Sec. IV-B. In each subsequent iteration, we impose a positivity constraint on the Schur complement of a submatrix as follows:
[TABLE]
Optimization problem (14) thus becomes:
[TABLE]
Given is fixed in (26), we can simplify the objective as follows. Writing in terms of its four sub-matrices in (24), we simplify the objective via matrix multiplication as
[TABLE]
where the reused notation is a constant as and are fixed in the iteration. denotes the first entry in vector , and denotes the remaining entries.
IV-C2 -bounded Reformulation
Computation of a large matrix inverse in (26) per iteration is costly. To avoid computing , we derive a bound based on the largest eigenvalue of to ensure the positivity constraint on the Schur complement in (26) is satisfied.
First, since is a real and symmetric PD matrix, it admits diagonalization with eigen-matrix (eigenvectors as columns) and diagonal matrix with real eigenvalues along its diagonal. Hence,
[TABLE]
which is essentially scaling the -norm of by eigenvalues in . Hence, a sufficient condition to the first constraint in (26) is to bound with the maximum eigenvalue of :
[TABLE]
is the reciprocal of the minimum eigenvalue of , i.e., . We employ Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) [70] to calculate , which is efficient to compute the extreme eigen-pairs of a large sparse matrix with linear convergence.
Having computed from without eigen-decomposition, we can reformulate (26) as
[TABLE]
Next, we design an efficient algorithm to address (30).
IV-C3 PG Algorithm
In each iteration of the block descent algorithm, we hold the diagonal entries fixed and optimize in (30).
The objective is convex while the lone constraint for forms a convex set. In particular, the constraint in (30) reduces to
[TABLE]
which is a -dimensional norm ball with radius . We now define an indicator function :
[TABLE]
We can rewrite the optimization for as an unconstrained problem by exchanging the convex set constraint with the indicator function in the objective:
[TABLE]
The first term is convex with respect to and differentiable, while the second term is convex but non-differentiable. Hence, we employ again the PG algorithm to solve (33).
Specifically, we first compute the gradient of the first term in the objective with respect to as
[TABLE]
which will be adopted in the step of gradient descent. We define a proximal mapping for , which is a projection onto the norm ball with radius :
[TABLE]
Then each iteration in the PG algorithm can be written as
[TABLE]
where is the step size as discussed in Sec. IV-B. We may also deploy APG to reduce the complexity further.
Finally, we analyze the convergence of our algorithm. The proposed alternating optimization algorithm optimizes diagonal and off-diagonal terms in in turn. When computing a solution for diagonal or off-diagonal terms, we adopt the newly computed solution only if the objective strictly decreases. Further, using an exponential kernel to compute edge weights means that any metric would always result in non-negative edge weights. This results in a PSD graph Laplacian [66], and our GLR objective (14) is lower-bounded by [math]. Hence, our algorithm strictly decrements an objective iteratively that is lower-bounded by [math], and our algorithm converges to a locally optimal solution.
V Feature Metric Learning for Point Cloud Denoising
Having described our feature graph learning scheme, we now employ it for 3D point cloud denoising. We first propose a patch-based model and designate graph connectivities over similar patches in a neighborhood. Then we formulate an inverse problem for point cloud denoising with GLR for regularization, which can be interpreted alternatively assuming an IGMRF model. Finally, we develop an alternating algorithm to efficiently solve the formulated problem.
V-A Patch-based Model
We assume an additive noise model for a point cloud, namely
[TABLE]
where denotes the observed noise-corrupted 3D coordinates of the target point cloud with points, is the ground truth 3D coordinates of the point cloud, and is an additive noise.
Modeling of depends on the actual point cloud acquisition mechanism. There exist a wide range of point cloud acquisition systems at different price points—from consumer-level depth sensors like Intel RealSense444https://www.intelrealsense.com/ costing 150 USD to high-end outdoor scanners like Teledyne Optech555https://www.teledyneoptech.com/en/products/static-3d-survey/ that cost up to 250,000 USD—and defining accurate noise models for all of them is difficult. We thus select the most common Gaussian noise model, which has been shown to be reasonably accurate for popular depth cameras like Microsoft Kinect [71, 72]. Hence, in (37) represents zero-mean additive white Gaussian noise (AWGN) with standard deviation , i.e.,
[TABLE]
In image denoising, a simple and effective assumption is self-similarity (also known as nonlocal means (NLM)) [36, 58, 37]: similar pixel patches exist throughout the same image, which can be searched and gathered for joint denoising. To exploit self-similarity also in point clouds, we also divide into patches. However, defining inter-patch similarity in a point cloud is not straightforward, because a point cloud is a collection of irregularly sampled points in 3D space666[41] formally defined patch similarity based on projections on parallel planes, but it is computation-intensive.. Instead, our work circumvents explicit search for similar patches.
Mimicking a fixed-size image patch with a center pixel, we first define a patch in a point cloud as done in [41]:
Definition 1. A patch in a point cloud is a local point set of points, consisting of a center point and its -nearest-neighbors in terms of Euclidean distance.
We divide the input point cloud into a set of overlapping patches, each with a center point . The patch centers are selected from a subset of points in . Intuitively, uniformly distributed center points are preferred, since they efficiently cover the entire point cloud. Hence, we employ the uniform sampling method in [73]. is empirically set, where and .
Next, for each center point we construct a patch by identifying ’s nearest neighbors. To exploit the assumed similarity among patches for denoising, we align patches via translation so each patch has its center at the origin. This results in patch set :
[TABLE]
where is a selection matrix to choose points from to form patches, each with points. Specifically, each row in contains only 0s except one 1 to choose one point in . denotes the coordinates of patch centers.
V-B Proposed Graph Connectivity
We connect two adjacent patches as follows. Two patches are considered adjacent if their centers are -nearest neighbors. Specifically, we use the -nearest-neighbor (NN) algorithm to search the nearest patches of each patch as the neighbors based on the Euclidean distance between patch centers. Then we build a graph over each pair of adjacent patches. Overall, this leads to a -nearest-patch graph on the entire point cloud.
Each point in one patch is then connected to its corresponding point in the other patch. For simplicity, we treat a pair of points in adjacent patches as corresponding points if their coordinates relative to their respective centers are closest to each other. Namely, for each point , we search the corresponding nearest point , which has the smallest Euclidean distance to :
[TABLE]
We do not explicitly connect points within the same patch, though two points in a patch may nonetheless be connected because they are also corresponding points in two different patches due to patch overlaps, as shown in Fig. 1.
We note that this is one possible graph connectivity among many. For example, one can in addition enable intra-patch filtering by drawing connections among points in the same patch [61, 62], resulting in a denser graph. For simplicity, we employ the most basic graph connectivity given inter-patch similarities. More sophisticated graph constructions considering also intra-patch filtering is left for future work.
Next, we formulate the problem of point cloud denoising via MAP estimation based on the chosen graph connectivities.
V-C MAP Formulation for Point Cloud Denoising
We pose a MAP estimation problem for the underlying patches : given the observed patches , find the most probable signal ,
[TABLE]
where is the likelihood function, and is the prior probability distribution of .
V-C1 Likelihood function
Since patches are extracted from the observed point cloud, is equivalent to . We thus define the likelihood function according to the additive Gaussian noise model in (37) and (38):
[TABLE]
where is a parameter.
V-C2 Prior probability distribution
We employ signal-dependent GLR as the prior, namely,
[TABLE]
where is a parameter, and is the Mahalanobis distance matrix satisfying constraints in (14). This prior essentially enforces the smoothness of patch signals with respect to the underlying graph .
We provide an alternative interpretation of GLR next. Specifically, we model the self-similarity among chosen patches in point clouds via first-order IGMRFs on irregular lattices. The definition of IGMRF is as follows [38]:
Definition 2. Let be an symmetric positive semi-definite matrix with rank . Then is an intrinsic GMRF of order with parameters , if its density is
[TABLE]
where denotes the generalized determinant (the product of non-zero eigenvalues). An intrinsic GMRF of order is also known as an improper GMRF of rank .
Further, is an intrinsic GMRF with respect to a graph , where
[TABLE]
In our context, we only consider the case where the graph is loopless and connected, leading to first-order IGMRF modelling of point clouds (i.e., ) [26].
Specifically, for patches translated to the origin , we model the difference between corresponding points and in adjacent patches. As the coordinates along each axis are independent, we consider each component separately. Taking the -axis coordinate and for instance, we define a normal increment
[TABLE]
where is a precision parameter, and is a positive and symmetric weight we incorporate for each pair of neighboring nodes and . The joint density then becomes
[TABLE]
As derived in [38], the corresponding precision matrix has the following form:
[TABLE]
Hence, under the first-order IGMRF model on irregular lattices where we assume normal increments between corresponding points, the prior distribution of is
[TABLE]
where . Here has the specific form as in (48).
Further, comparing the specific form of in (48) and the definition of the combinatorial graph Laplacian in Sec. III-A, we have
[TABLE]
Hence, we replace in the quadratic term in (49) with and consider all the three components, which leads to the GLR prior in (43) and thus an alternative perspective of GLR under first-order IGMRF.
V-C3 Final formulation for point cloud denoising
Combining (39), (41), (42) and (43), we have
[TABLE]
We rewrite the objective by taking the logarithm of (51) and multiplying by . Also, with the constraints of considered, we have the final problem formulation for point cloud denoising
[TABLE]
where .
Next, we develop an alternating algorithm to solve (52).
V-D Proposed Algorithm for Point Cloud Denoising
We propose to address (52) by alternately optimizing the point cloud and the Mahalanobis distance matrix . The iterations terminate when the difference in the objective between two consecutive iterations stops decreasing. Parameter settings are presented in Section VI.
V-D1 Optimizing the point cloud
In the first iteration, we initialize with an identity matrix and thus fix in (52). Next, taking the derivative of (52) with respect to the three components of , , we have
[TABLE]
where is an identity matrix. (53) can be treated as three linear equation sets and thus can be efficiently solved using conjugate gradient (CG) methods, such as the LSQR algorithm [74]. The acquired solution of is then used to update in the subsequent iteration.
V-D2 Optimizing the Mahalanobis distance matrix
When is fixed, the optimization of is feature metric learning in Sec. IV. is then computed from optimized by definition and our edge weight kernel.
Specifically, we consider two features: Cartesian coordinates and normals. As done in [61, 62], surface normals are adopted to promote smoothness of the underlying surface on which the point clouds are discrete samples. Along with coordinates, we thus form a -dimensional feature vector at each point , i.e., , where denotes the coordinates of point , and denotes its normal vector. Together with one observation with coordinates , these per-node feature vectors are used for feature metric learning of matrix as described in Section IV.
The inter-node sample difference square in (14) now denotes the squared Euclidean distance between the coordinates of and , namely,
[TABLE]
where denotes and are corresponding points that are connected.
A flowchart of the proposed feature graph learning for point cloud denoising is demonstrated in Fig. 2, and an algorithmic summary is presented in Algorithm 1.
VI Experimental Results
VI-A Experimental Setup
We evaluate feature graph learning for point cloud denoising by testing on several point cloud datasets:
- Face model [50] and Iron Vise model [51], which are raw scans from laser scanners exhibiting real-world noise but without ground truth;
- the surface reconstruction benchmark models including five point clouds [75] and Fandisk model [37], which are clean point clouds as ground truth. We add Additive White Gaussian Noise (AWGN) to benchmark with a range of standard deviation for extensive objective comparison. AWGN with is added to Fandisk for subjective comparison. For the choice of in (14), we first select a median value of , and search around this value at a step size of 0.1 to obtain good results for each dataset. We observe that the denoising performance of our algorithm is relatively insensitive to experimentally.
We compare the proposed approach with seven competing point cloud denoising algorithms, including two MLS-based methods APSS [47] and RIMLS [48], one LOP-based method AWLOP [51], one sparsity based method MRPCA [39], two non-local based methods NLD [55] and LR [40], and one graph-based method GLR [41]. The implementation details are as follows. We employ the toolbox of the MeshLab software [76] to run APSS and RIMLS. We try each filter scale in the range and choose one with the best result. For AWLOP, we use its function in the EAR software [51], and choose the repulsion force in and the filter iteration in the range to acquire one best solution. The source codes of MRPCA and GLR are provided by the authors. We try data fitting iterations in the range for MRPCA, and follow the parameter settings in [41] for GLR. We implement NLD and LR in MATLAB, and follow the default settings in their papers. The Diagonal approach is implemented by ourselves in MATLAB.
Further, to validate the necessity of optimizing edge weights, we compare against two Baseline schemes of our method: 1) Baseline1, where the edge weights are randomly set in range instead of optimizing via feature graph learning; 2) Baseline2, where the edge weights are calculated from feature vectors using an exponential kernel.
VI-B Experimental Results
VI-B1 Demonstration of Iterations
To show the fast convergence speed of our algorithm, we demonstrate the denoising results of Iron Vise model in every two iterations. We set the weighting parameter of GLR in (52), where is the natural logarithmic base and is the iteration index starting from . decreases with iterations so as to prevent over-smoothing. As presented in Fig. 3, as the number of iterations increases, the point cloud gradually becomes smoother, until it almost converges at iteration 7.
VI-B2 Objective Comparison
We measure the quality of denoised results for Benchmark models by the Mean Squared Error (MSE) and Signal-to-Noise Ratio (SNR) between each denoised point cloud and the ground truth as in [62]. Numerical results are listed in Table III and Table IV, respectively.
Both tables show that our method outperforms all the other competing approaches at various noise levels in general, especially at high noise level . Also, we outperform the Baseline scheme with random weights, which validates optimizing edge weights is essential.
Further, to demonstrate the case of learning from partial observation of one signal, we randomly sample a subset of points () in each point cloud for the learning of the feature metric at noise level . On one hand, we achieve comparable results with learning from the entire single observation as listed in Table III and Table IV. On the other hand, we compare with the diagonal-only method in [27], where only a diagonal feature metric is learned from of points. Results show that we still outperform [27] in the circumstance of learning feature metric from partial observation. This validates the effectiveness of our method even when extending to the case of partial observation of one signal.
VI-B3 Comparison with Vanilla Proximal Gradient
Moreover, we compare with the naïve realization of proximal gradient Vanilla PG as discussed in Section IV-A to solve (14). As presented in Table V, while our proposed algorithm approximates the original search space in Vanilla PG by rewriting the PD constraint, our point cloud denoising results are very close to the performance by Vanilla PG. This validates the effectiveness of our optimization approximation.
VI-B4 Subjective Comparison
Fig. 4 shows visual results of Quasimoto model in Benchmark without surface reconstruction for details. We compare with other denoising approaches at noise level . It can be seen that our results preserve structural details well, even for tiny components such as the cigarette in Fig. 4. In comparison, the cigarette is distorted in all the other reconstruction results. Also, points in our results are more uniformly distributed than the others, even for noise with large variance.
In Fig. 5, we add AWGN to Fandisk model with standard deviation . We see that our method preserves sharp features, while the other approaches result in smoothed edges to various extent. Meanwhile, our method reconstructs smooth surfaces well, such as the bottom surface as presented in the first row of Fig. 5.
Finally, we evaluate on real-world noisy point clouds Iron Vise and Face acquired from laser scanners. Typical imperfections associated with digital scans, such as noise, non-uniform point distribution, or missing data, are ubiquitous in these datasets. As shown in Fig. 3 and Fig. 6, our method is able to keep local details and sharp edges while attenuating noise significantly.
VI-C Limitation
The limitation of the proposed algorithm includes two aspects.
- •
The optimal choice of the parameter in (14)—the upper bound of the trace of the feature metric —is not obvious. We choose it empirically in the experiments, and found that our algorithm’s denoising performance is relatively insensitive to experimentally.
- •
The requirement of having a complete feature vector per graph node may not be practical for some applications.
VII Conclusion
We study feature graph learning to identify an appropriate underlying graph given a single signal observation. Assuming the availability of relevant features per node, we formulate the problem as minimization of Graph Laplacian Regularizer using the Mahalanobis distance matrix as variable. We develop a fast algorithm to alternately optimize diagonal and off-diagonal entries of , while keeping positive definite. We mitigate full matrix eigen-decomposition and large matrix inverse for fast computation. To validate the effectiveness of the proposed feature graph learning, we employ it for 3D point cloud denoising with 3D coordinates and surface normals as features, leading to state-of-the-art performance.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] G. B. Giannakis, Y. Shen, and G. V. Karanikolas, “Topology identification and learning over graphs: Accounting for nonlinearities and dynamics,” Proceedings of the IEEE , vol. 106, no. 5, pp. 787–807, May 2018.
- 2[2] X. Dong, D. Thanou, M. Rabbat, and P. Frossard, “Learning graphs from data: A signal representation perspective,” IEEE Signal Processing Magazine , vol. 36, no. 3, pp. 44–63, 2019.
- 3[3] G. Mateos, S. Segarra, A. G. Marques, and A. Ribeiro, “Connecting the dots: Identifying network structure via graph signal processing,” IEEE Signal Processing Magazine , vol. 36, no. 3, pp. 16–43, May 2019.
- 4[4] J. Pang and G. Cheung, “Graph laplacian regularization for image denoising: Analysis in the continuous domain,” IEEE Transactions on Image Processing (TIP) , vol. 26, no. 4, pp. 1770–1785, 2017.
- 5[5] X. Liu, G. Cheung, X. Wu, and D. Zhao, “Random walk graph laplacian-based smoothness prior for soft decoding of jpeg images,” IEEE Transactions on Image Processing (TIP) , vol. 26, no. 2, pp. 509–524, 2016.
- 6[6] Y. Bai, G. Cheung, X. Liu, and W. Gao, “Graph-based blind image deblurring from a single photograph,” IEEE Transactions on Image Processing (TIP) , vol. 28, no. 3, pp. 1404–1418, 2019.
- 7[7] X. Liu, G. Cheung, X. Ji, D. Zhao, and W. Gao, “Graph-based joint dequantization and contrast enhancement of poorly lit jpeg images,” IEEE Transactions on Image Processing (TIP) , vol. 28, no. 3, pp. 1205–1219, 2019.
- 8[8] N. Meinshausen and P. Bühlmann, “High-dimensional graphs and variable selection with the lasso,” The Annals of Statistics , vol. 34, no. 3, pp. 1436–1462, 2006.
