Multi-feature Distance Metric Learning for Non-rigid 3D Shape Retrieval

Huibing Wang; Haohao Li; Xianping Fu

arXiv:1901.03031·cs.CV·January 11, 2019

Multi-feature Distance Metric Learning for Non-rigid 3D Shape Retrieval

Huibing Wang, Haohao Li, Xianping Fu

PDF

Open Access

TL;DR

This paper introduces a multi-feature distance metric learning approach for non-rigid 3D shape retrieval, effectively integrating multiple geometric features to improve retrieval accuracy over existing methods.

Contribution

The study proposes a novel multi-feature distance metric learning method that utilizes KL-divergence to combine complementary features into a shared latent space for better shape retrieval.

Findings

01

Significantly outperforms state-of-the-art methods on benchmark datasets.

02

Effectively integrates multiple features for improved shape similarity measurement.

03

Demonstrates robustness in non-rigid 3D shape retrieval tasks.

Abstract

In the past decades, feature-learning-based 3D shape retrieval approaches have been received widespread attention in the computer graphic community. These approaches usually explored the hand-crafted distance metric or conventional distance metric learning methods to compute the similarity of the single feature. The single feature always contains onefold geometric information, which cannot characterize the 3D shapes well. Therefore, the multiple features should be used for the retrieval task to overcome the limitation of single feature and further improve the performance. However, most conventional distance metric learning methods fail to integrate the complementary information from multiple features to construct the distance metric. To address these issue, a novel multi-feature distance metric learning method for non-rigid 3D shape retrieval is presented in this study, which can make…

Tables2

Table 1. Table 1: Five quantitative measures on SHREC’11

Method	NN	FT	ST	E	DCG
FOG	96.8	81.7	90.3	66.0	94.4
BOW-LSD	95.5	67.2	80.3	57.9	89.7
MDS-CM-BOF	99.5	91.3	96.9	71.1	98.2
LSF	99.5	79.9	86.3	63.3	94.3
SD-GDM	100	96.2	98.4	73.1	99.4
MeshSIFT	99.5	88.4	96.2	70.8	98.0
Our method	100	100	100	74.5	100

Table 2. Table 2: Five quantitative measures on SHREC’15

Method	NN	FT	ST	E	DCG
HAPT	100.0	96.1	97.9	81.2	99.9
SG_L1	97.3	75.9	81.4	65.9	91.9
FVF-WKS	100	82.5	86.3	88.3	71.8
SID	97.7	71.9	82.1	64.8	92.0
EDBCF_NW	97.8	79.3	83.4	70.8	94.3
Our method	100	99.2	99.7	82.7	99.5

Equations20

d_{v} (x_{i}^{v}, x_{j}^{v}) = (x_{i}^{v} - x_{j}^{v})^{T} A_{v} (x_{i}^{v} - x_{j}^{v}),

d_{v} (x_{i}^{v}, x_{j}^{v}) = (x_{i}^{v} - x_{j}^{v})^{T} A_{v} (x_{i}^{v} - x_{j}^{v}),

A_{v} = L_{v}^{T} L_{v} .

A_{v} = L_{v}^{T} L_{v} .

d_{v} (x_{i}^{v}, x_{j}^{v}) = (x_{i}^{v} - x_{j}^{v})^{T} A_{v} (x_{i}^{v} - x_{j}^{v}) = L_{v} x_{i}^{v} - L_{v} x_{j}^{v}_{2} .

d_{v} (x_{i}^{v}, x_{j}^{v}) = (x_{i}^{v} - x_{j}^{v})^{T} A_{v} (x_{i}^{v} - x_{j}^{v}) = L_{v} x_{i}^{v} - L_{v} x_{j}^{v}_{2} .

δ_{ij} (τ - d_{v}^{2} (x_{i}^{v}, x_{j}^{v})) > 1.

δ_{ij} (τ - d_{v}^{2} (x_{i}^{v}, x_{j}^{v})) > 1.

L_{v}, v = 1, .., m min v = 1 \sum m i, j \sum \frac{1}{2} g (1 - δ_{ij} (τ - d_{v}^{2} (x_{i}^{v}, x_{j}^{v})) + v = 1 \sum m λ_{v} ∥ L_{v} ∥_{F}^{2} .

L_{v}, v = 1, .., m min v = 1 \sum m i, j \sum \frac{1}{2} g (1 - δ_{ij} (τ - d_{v}^{2} (x_{i}^{v}, x_{j}^{v})) + v = 1 \sum m λ_{v} ∥ L_{v} ∥_{F}^{2} .

L_{v} min K L (p (x^{v}; A^{*}) ∣∣ p (x^{v}; L_{v}^{T} L_{v}))

L_{v} min K L (p (x^{v}; A^{*}) ∣∣ p (x^{v}; L_{v}^{T} L_{v}))

L_{v}, v = 1, .., m min v = 1 \sum m i, j \sum \frac{1}{2} g (1 - δ_{ij} (τ - d_{v}^{2} (x_{i}^{v}, x_{j}^{v})) + β v = 1 \sum m K L (p (x^{v}; A^{*}) ∣∣ p (x^{v}; L_{v}^{T} L_{v})) + v = 1 \sum m λ_{v} ∥ L_{v} ∥_{F}^{2}

L_{v}, v = 1, .., m min v = 1 \sum m i, j \sum \frac{1}{2} g (1 - δ_{ij} (τ - d_{v}^{2} (x_{i}^{v}, x_{j}^{v})) + β v = 1 \sum m K L (p (x^{v}; A^{*}) ∣∣ p (x^{v}; L_{v}^{T} L_{v})) + v = 1 \sum m λ_{v} ∥ L_{v} ∥_{F}^{2}

K L (p (x^{v}; A^{*}) ∣∣ p (x^{v}; L_{v}^{T} L_{v})) = \frac{1}{2} D_{ℓ d} (L_{v}^{T} L_{v}, A^{*})

K L (p (x^{v}; A^{*}) ∣∣ p (x^{v}; L_{v}^{T} L_{v})) = \frac{1}{2} D_{ℓ d} (L_{v}^{T} L_{v}, A^{*})

L_{v}, v = 1, .., m min v = 1 \sum m i, j \sum \frac{1}{2} g (1 - δ_{ij} (τ - d_{v}^{2} (x_{i}^{v}, x_{j}^{v})) + α v = 1 \sum m (t r (L_{v}^{T} L_{v}, A^{*^{- 1}}) - lo g det (L_{v}^{T} L_{v} A^{*^{- 1}}) - n) + v = 1 \sum m λ_{v} ∥ L_{v} ∥_{F}^{2}

L_{v}, v = 1, .., m min v = 1 \sum m i, j \sum \frac{1}{2} g (1 - δ_{ij} (τ - d_{v}^{2} (x_{i}^{v}, x_{j}^{v})) + α v = 1 \sum m (t r (L_{v}^{T} L_{v}, A^{*^{- 1}}) - lo g det (L_{v}^{T} L_{v} A^{*^{- 1}}) - n) + v = 1 \sum m λ_{v} ∥ L_{v} ∥_{F}^{2}

L_{v}^{t + 1} = L_{v}^{t} - ϵ (L_{t}^{v} ij \sum \frac{δ _{ij} ( x _{i} - x _{j} ) ( x _{i} - x _{j} ) ^{T}}{1 + exp ( β z _{ij} )} + 2 L_{v}^{t} A^{*^{- 1}} - 2 (L_{v}^{t^{T}} L_{v}^{t})^{†} L_{v}^{t} + 2 λ_{v} L_{v}^{t})

L_{v}^{t + 1} = L_{v}^{t} - ϵ (L_{t}^{v} ij \sum \frac{δ _{ij} ( x _{i} - x _{j} ) ( x _{i} - x _{j} ) ^{T}}{1 + exp ( β z _{ij} )} + 2 L_{v}^{t} A^{*^{- 1}} - 2 (L_{v}^{t^{T}} L_{v}^{t})^{†} L_{v}^{t} + 2 λ_{v} L_{v}^{t})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Image Processing and 3D Reconstruction

Full text

∎

11institutetext: H. Wang 22institutetext: College of Information and Science Technology, Dalian Maritime University, Dalian, China, 116021

22email: [email protected] 33institutetext: H. Li 44institutetext: School of Mathematical Sciences, Dalian University of Technology, Dalian, China, 116024

44email: [email protected] 55institutetext: X. Fu 66institutetext: College of Information and Science Technology, Dalian Maritime University, Dalian, China, 116021

66email: [email protected]

Multi-feature Distance Metric Learning for Non-rigid 3D Shape Retrieval

Huibing Wang

Haohao Li

Xianping Fu 111Corresponding Author

(Received: date / Accepted: date)

Abstract

In the past decades, feature-learning-based 3D shape retrieval approaches have been received widespread attention in the computer graphic community. These approaches usually explored the hand-crafted distance metric or conventional distance metric learning methods to compute the similarity of the single feature. The single feature always contains onefold geometric information, which cannot characterize the 3D shapes well. Therefore, the multiple features should be used for the retrieval task to overcome the limitation of single feature and further improve the performance. However, most conventional distance metric learning methods fail to integrate the complementary information from multiple features to construct the distance metric. To address these issue, a novel multi-feature distance metric learning method for non-rigid 3D shape retrieval is presented in this study, which can make full use of the complimentary geometric information from multiple shape features by utilizing the KL-divergences. Minimizing KL-divergence between different metric of features and a common metric is a consistency constraints, which can lead the consistency shared latent feature space of the multiple features. We apply the proposed method to 3D model retrieval, and test our method on well known benchmark database. The results show that our method substantially outperforms the state-of-the-art non-rigid 3D shape retrieval methods.

Keywords:

Multi-view learning Distance metric learning Non-rigid 3D shape retrieval

1 Introduction

With the development of information technology wu2019cycledeep ; wu2018deepadatpt , non-rigid 3D shape retrieval has been an active research spot for many years with explosive growth of 3D models reuter2006laplace ; bronstein2011shape ; lian2013comparison ; litman2014supervised ; xie2017deepshape ; wu20183d . The 3D shape retrieval are described as: Given a set of 3D shapes and a query shape, we would like to develop an effective algorithm to measure the similarity of the query wang2015effective to all shapes in the datebase. 3D models have the complicated geometric structure information, which is difficult to construct the discriminative global features for various application. Only onefold global features usually cannot characterize the 3D shapes well, which means that the only onefold intrinsic geometric information is not enough to discriminate various 3D shapes for the non-rigid 3D shape retrieval task. Meanwhile the non-rigid deformations of the shapes induce the noise of the features, which impacts the computation of 3D shape similarity. Therefore, how to effectively calculate the distance between non-rigid 3D shapes is still a challenging problem.

In recent years, various non-rigid 3D shape retrieval algorithms had been proposed. Most of the algorithms focus on extracting the intrinsic features of the shapes based on the local or global geometric structure and measuring the similarity of the features. These approaches usually extract novel intrinsic features for the shapes firstly. Then, the hand-crafted distance metric or conventional distance metric learning methods are used to compute the similarity of the features. In reuter2006laplace , the bag-of-word feature with spatial information are constructed by coding the spectral signatures. Then the Similarity Sensitive Hashing (SSH) are used to improve performance of the retrieval. Litman litman2014supervised extract the global features by sparse dictionary learning algorithm, and explore the Euclidean metric to measure the similarity between the features. These methods explore single intrinsic features, which is not enough for discriminating various 3D shapes. Different with the single features, multiple features contain compatible and complementary geometric information which can improve the performance of the retrieval task. Chiotellis chiotellis2016non , use the weighted averaging directly on two spectral signatures to construct the global features, and the similarity of the features are measured by Large margin nearest neighbor algorithm. Some approaches xie2017deep ; ohbuchi2010distance ; lian2011shape explore weighted averaging of the distance between single feature to measure the similarity of the shapes. These methods concatenate all features into one single feature to adapt to the hand-craft distance metric or distance metric learning setting. However, this concatenations not physically meaningful because each feature has a specific statistical property xu2013survey . Therefore, it can not exploit the complementary geometric information to discriminate the 3D shapes well.

Meanwhile, many researchers focus on multi-view learning methods wang2018multiview , which makes a significant development in machine learning fields wu2018and ; wu2018deepatten ; wu2018andwhere . In their mind, a real-world object may have different descriptions from multi-view observation spaces. These spaces usually look different from each other but are highly related. The multi-view setting is usually combined with single view based on either the consensus principle or the complementary principle to improve the performance of various tasks xu2013survey ; zhai2012multiview ; hardoon2004canonical ; kumar2011co ; xu2015multi . Zhai zhai2012multiview presented a multi-view metric learning method named Multiview Metric Learning with Global consistency and Local smoothness(MVML-GL) under a semisupervised learning setting. This method seeks a global consistent shared latent feature space firstly, and then a explicit mapping functions between the input spaces and the shared latent space can be learned via regularized locally linear regression. The process of these two steps can be solved by convex optimizations in closed form. Canonical Correlation Analysis (CCA) hardoon2004canonical is a statistical methods correlating linear relationships between more variables. Kernel CCA(KCCA) explore the kernel function framework to extend the nonlinear processing ability. Kumar kumar2011co proposed a co-regularized framework by advancing co-training for multi-view spectral clustering. Iterative optimization procedure is adopted to update the eigenvector one after another. Xu xu2015multi proposed a Multi-view Intact Space Learning(MISL), which integrates the encoded complementary information in multiple views to discover a latent intact representation of the data. Intact space learning for multi-view learning provides a new multi-view representation method. It can be extened to supervised learning problems, but adding a hinge loss, or a multi-view loss to the objective. More related works survey have been proposed by wang2016iterative ; wang2015robust ; wang2017unsupervised ; wang2017effective which provide a more comprehensive introduction for the recent developments of multi-view learning methods on the basis of coherence with early methods. In Computer Graphic community, the multi-view means that the multiple angles projection of the 3D models. In order not to confuse the concepts, we use the multi-feature in the Computer Graphic community to replace the multi-view in machine learning community wang2014exploiting .

Inspire by the multi-view learning methods wang2018beyond , we develop a novel multi-feature distance metric learning algorithm in this paper, which can make fully use of the geometric information from multiple shape features. We introduce the multi-feature distance metric learning algorithm to construct a common distance metric for all features. For each feature, the distance of inner-class pair is less than a smaller threshold and that of each extra-class pair is higher than a larger threshold, respectively. Meanwhile, the algorithm minimize the distance between the Gaussian distributions of different features under different distance metrics based on KL-divergence. The two constraints are both adopted to obtain the common distance metric. Figure1. shows the pipeline of the proposed framework.

The organization of this paper is as follows. In Section 2, we provide a brief overview of previous related work of the local descriptor, shape features and metric learning algorithms. In Section 3, we present the detail of the multi-feature metric learning algorithm for non-rigid 3D shape retrieval. In Section 4, we show the results of our experiment. Section 5 concludes the paper.

2 Related Work

The intrinsic feature of the shape is of importance for the non-rigid shape retrieval. Numerous works attempt to extract a discriminative and informative intrinsic feature for this task. The intrinsic feature is usually extracted by the intrinsic descriptors of the shape. Up to the present, most of the intrinsic descriptors are constructed by using spectral method, which is based on Laplace-Beltrami Operator (LBO) pinkall1993computing . The intrinsic descriptors are often categorized as local and global. The global descriptors can be used as the feature to measure similarity among database directly. Due to the unordered of the points of the mesh, constructing an effective intrinsic global descriptors directly is difficult. The most famous intrinsic global descriptor is ShapeDNA reuter2006laplace . It is constructed by truncating the normalized sequence of the eigenvalues of the LBO. Another effective global descriptor is based on Modal Function Transformation framework kuang2015modal . In this framework, the spatial information of the intrinsic functions are used to construct a inner function. Then the ordered and L2 normalized eigenvalues of the inner function are adopt as the global descriptors. These two descriptors are extracted as the intrinsic features of the shapes, and the Euclidean distance or hand-crafted distance is usually used as the similarity for shape retrieval. However, the global features mainly contains the geometric structure information, and lose details of the shape. There are many point or local spectral descriptors, which contains abundant local details of the shape. Rustamove rustamov2007laplace explore the all the spetra (eigenvalues and eigenvectors) of a shape to construct the Global Point Signature (GPS). The GPS is a intrinsic point descriptor, and robust to topological noise. But the eigenvectors are very close when the corresponding eigenvalues are close to each other. Sun sun2009concise proposed the Heat Kernel Signature based(HKS) on heat equation. The diagonal elements of the heat kernel matrix are extracted as the HKS point descriptor. The HKS can be interpreted as the amount of heat that remains at the point of surface over a period of time. HKS is intrinsic, multi-scale and robust, which is useful for non-rigid shape analysis. However, HKS is sensitive to the change of the shape scale. Bronstein bronstein2010scale introduced a scale-invariant version of HKS by using the Fourier transform, which moves from the time domain to the frequencies domain. Then, Aubry aubry2011wave proposed Wave Kernel Signature based on Schrodinger equation, which describes the average probability over time to locate a particle with a certain energy distribution at the point on the surface. WKS clearly separates influences of different frequencies, treating all frequencies equally. Hence WKS reserves more high frequency information than HKS. A comprehensive survey in li2014spatially ; limberger2017spectral provides more details of the spectral signatures.

As mentioned above, although the global shape descriptors can be used for shape retrieval directly, the lack of details limit their performance on some benchmark in which the shapes contain many details. Therefore, make full use of the point or local descriptors is important for the non-rigid shape retrieval task. Many approaches aggregate the point descriptors, regions or partial views to construct the global intrinsic features by using various algorithms. Among the algorithms, Bag of Words (BoW) is the most popular one. BoW had been successfully applied to computer vision, natural language processing, etc. In recent years, it has been concerned in shape retrieval field[2]. The geometric equivalent of ‘words’ are local descriptors, which are quantized in a ‘geometric dictionary’ to obtain the ‘bag-of-geometric words’ litman2014supervised . This algorithm codes the local descriptor to construct a global feature, in which contains rich details of the shape. Bronstein bronstein2011shape exploited the BoW algorithm and added the spatial relations to extract the Spatially Sensitive Bags of Features (SS-BoF). The SS-BoF exhibited an excellent performance in SHREC’10 ShapeGoogle dataset benchmark. Litman litman2014supervised explored supervised dictionary leaning with sparse coding algorithm for extracting the global feature based on point descriptors. Subsequently, the Fisher Vector (FV) and Super Vector (SV) algorithm are introduced to code the point descriptors limberger2015feature . These two algorithms are similar to the BoW algorithm. The ‘dictionary’ is designed firstly by Gaussian Mixture Model, and then the local descriptors are coded by the Gaussian distributions. These algorithm contain multi-order information, which is more informative than BoW. Therefore, the FV and SV algorithms extract more comprehensive features for the shape. Unlike the BoF which aims to code the descriptors, Li li2013intrinsic proposed a intrinsic spatial pyramid matching method for the retrieval task and also achieved a good performance. Furthermore, there are some approaches focus on the metric between the features more chiotellis2016non . Chiotellis chiotellis2016non ; xie2017deep , use the weighted averaging directly on siHKS and WKS to construct the global features, and then explored the Large margin nearest neighbor algorithm to obtain the metric between the features. This method is very concise, efficient, and effective, and the result outperforms many methods in SHREC14 benchmark. The success of this approach is based on the LMNN algorithm. Therefore, the distance metric learning algorithm is also very important in the retrieval task.

Appropriate similarities between samples can improve the performances of the retrieval system. During the past decade, several well-known distance metric learning methods are proposed for various fields davis2007information ; weinberger2009distance ; suykens1999least ; wold1987principal ; mika1999fisher ; wang2016semantic , such as ITML davis2007information , LMNN weinberger2009distance , SVMs suykens1999least , PCA wold1987principal , LDA mika1999fisher , etc. These algorithms have been used for many computer vision and computer graphic tasks, such as classification, retrieval, correspondence, etc. These algorithms solve the problem that most features lie in a complex high-dimensional spaces where Euclidean metric is ineffective. However, most distance metric learning methods fail to integrate compatible and complementary information from multiple features to construct a distance metric. In order to explore more useful information for various applications, many researchers invest many methods to combine multi-view setting to distance metric learning algorithm. Kan kan2016multi proposed a multi-view discriminant analysis as an extension of LDA, which has achieved excellent performances facing with multi-view features. Wu wu2016online proposed an online multi-modal distance metric learning which has been successfully applied in image retrieval.

3 Proposed Approach

In this section, we introduce the proposed multi-feature metric learning algorithm (MfML) for 3D non-rigid shape retrieval in detail. We extract different types of 3D intrinsic features. Some features are global intrinsic shape descriptors, which is used to describe the global structure of the shapes. And some features are extracted by using the BoW algorithm to code different types of point descriptors, which is used to code the geometric information of local points based on various scales. These intrinsic multiple features are used to train a common metric, which fully integrates compatible and complementary information from them. Then, we illustrate the optimization of the algorithm.

3.1 The Structure of Multi-feature Metric Learning

Let $X^{v}=[x^{v}_{1},x^{v}_{2},...,x^{v}_{N}]\in R^{d_{v}\times N},v=1,2,...,m$ be the training set of the $v$ th intrinsic feature, where $x^{v}_{i}\in R^{d_{v}}$ is $i$ th samples and $N$ is the total number of samples. The Mahalanobis distance metric learning algorithm try to obtain a square matrix as the metric matrix. For $v$ th features, the distance between any two samples $x^{v}_{i}$ and $x^{v}_{j}$ can be computed as:

[TABLE]

with the $A_{v}$ being decomposed as:

[TABLE]

And then the $d_{v}(x^{v}_{i},x^{v}_{j})$ can also be written as:

[TABLE]

We can see from the equation that learning a Mahalanobis distance metric is equivalent to finding a linear projection onto a subspace, under which the Euclidean distance of two samples in the transformed space is equal to the Mahalanobis distance metric in the original space. We expect that the Euclidean distances between positive pairs are smaller than those between negative pairs in the subspace. Figure 2 shows the basic idea. In order to improve its discriminative ability we explore the following constraint hu2014discriminative :

[TABLE]

We use $C$ to express the set that contains the pairs of samples from the same class, and $M$ to express the set that contains the pairs of samples from the different class. Let $\delta_{ij}=-1$ if $(x^{v}_{i},x^{v}_{j})\in M$ or else $\delta_{ij}=1$ . Then, above constrain in equation 1 is adopted by our algorithm as follows:

[TABLE]

where $g(x)=\frac{1}{\rho}log(1+\exp(\rho x))$ is a smoothed approximation of the hinge loss function, $\left\|L_{v}\right\|_{F}^{2}$ is the regularization term, $\lambda_{v}$ are regularization parameters. We can find the optimal subspace projection matrix $L_{v},v=1,...,m$ by minimizing Eq.2.

However, it is clearly that minimizing the equation 3 equals to the sum of all features with constrain 1, which exploits neither the consensus principle nor the complementary principle for improving learning performance. Due to combine the complementary information from multiple features, we explore a hypotheses that each feature of the sample follows the Gaussian distribution with a Mahalanobis distance parameterized by $L_{v}^{T}L_{v}$ , and all the distributions are similar. Inspired by ITML davis2007information and CMSC kumar2011co , we formulate the following cost function to measure the disagreement between the metrics $A_{v}$ and the consensus one $A_{*}$ :

[TABLE]

where $p$ is a multivariate Gaussian as $p(x;A)=\frac{1}{Z}\exp(-\frac{1}{2}(x-\mu)^{T}A(x-\mu))$ , and where $Z$ , $\mu$ is a normalizing constant and the mean vector respectively. The $A^{*}\in R^{n\times n}$ is defined as $A^{*}=\epsilon I+\frac{1}{m}(L_{1}^{T}L_{1}+L_{2}^{T}L_{2}+...+L_{m}^{T}L_{m})$ . $A^{*}$ can be treated as the common distance metric for all features. The optimization of equation 3 makes all the Gaussian distributions to be similar, which induces that every $A^{v}$ is closed with $A^{*}$ . Hence, by adopting two constrains, we can formulate a new cost function to construct a new metric:

[TABLE]

where $\beta$ is the parameter to balance trade-off between two constrains. We can see from the equation 4 that MfML can separate the samples from different classes by using information from multiple features. The consensus $A_{*}$ is constructed by all $A_{v}$ , which fully integrates the complementary information from every feature. Meanwhile, we can see from the optimization process that the update of $A_{v}$ is also affected by $A_{*}$ .

3.2 Optimization Process of MfML

In this section, we provide the detail of the optimization process. Computing the gradient directly based on the definition of $KL$ divergence is difficult. Hence, we reference ITML davis2007information to simiplify the second term as:

[TABLE]

where $D_{\ell d}(L_{v}^{T}L_{v},A^{*})=tr(L_{v}^{T}L_{v},A^{*^{-1}})-\log{\det{(L_{v}^{T}L_{v}A^{*^{-1}}})}-n$ . The $D_{\ell d}(A,B)$ is called Burg matrix divergence(or the LogDet divergence), which is a convex functions defined over matrices. And then, the cost function can be reformulated as follows:

[TABLE]

In order to solve the Eq.5, an alternating minimization is carried out. We optimize one $L_{v}$ at one time with other variables fixed by gradient descent algorithm. The consensus metric $A^{*}$ is updated after optimizing every $L_{v}$ . And then, the $L_{v}$ are updated based on the new $A^{*}$ . We explore the Gradient Descent (GD) to solve $L_{v}$ as:

[TABLE]

where $z_{ij}=1-\delta_{ij}(\tau-d^{2}_{v}(x^{v}_{i},x^{v}_{j}))$ . At last, we can a consensus metric matrix $A^{*}$ as the output of the MfML algorithm. The $A^{*}$ can be directly used for measuring the similarity between the any type features that have been preprocessed by PCA for unifying the dimension. From the procedure of updating $L_{v}$ and $A^{*}$ , we can see that the information from multiple feature is integrated into a co-regularized framework.

4 Experiment

In this section, we demonstrate the results of non-rigid 3D shape retrieval based on MfML, and then compare it with the state-of-the-art non-rigid 3D shape retrieval approaches on SHREC’11 lian2011shape and SHREC’15 lian2015shrec ; lian2010shrec benchmark dataset. The experiment is conducted on a 3.0 GHz Core(TM) i7 computer with 16GB memory.

4.1 Experiment Setting

For all 3D shape benchmark datasets, we explore 2 different types of point signatures and 1 global descriptor to form multiple shape features. We show the setting of the point signatures and the global descriptor used in our experiment as follows:

1)WKS: The Wave Kernel signature describes the average probability over time to locate a particle with a certain energy distribution at the point on the surface aubry2011wave . WKS clearly separates influences of different frequencies, treating all frequencies equally, and organizes the intrinsic geometric information of the point in a multi-scale way.

2)siHKS: The scale-invariant Heat Kernel Signature (siHKS) is a scale-invariant version of heat kernel descriptor bronstein2010scale . The construction is based on a logarithmically sampled scale-space, and then the absolute values of Fourier transform are used for moving the scale factor from the time domain to the frequencies domain.

3)ShapeDNA: The ShapeDNA is constructed by truncating the normalized sequence of the eigenvalues of the LBO reuter2006laplace . The main advantages of ShapeDNA are the simple representation, comparison, scale invariance. And in spite of its simplicity, it has a good performance for non-rigid shape retrieval.

We use the first 100 eigenvectors of LBO to construct two point signatures. The 100-dimensional WKS with setting the variance to 6 and 50-dimensional siHKS with same setting as in litman2014supervised are extracted by them. Then we explore the BoW algorithm to code the WKS and siHKS respectively, and then we can obtain the 64-dimensional BoW-WKS and BoW-siHKS global features. We utilize the first 40 normalized eigenvalues of the LBO as the ShapeDNA feature. PCA is used to project all features into a 30 dimension subspace as the pre-processing of our experiment.

4.2 Experiment on SHREC’11

In this section, we conduct 2 experiments on SHREC’11 benchmark dataset. The database contains 600 watertight meshes, which is derived from 30 original models. Every class contains 1 null model and 19 deformed models based on it. Firstly we compare method based on MfML with the methods related with LBO: (1)ShapeGoogle bronstein2011shape , 2)Modal Function Transformation(MFT) kuang2015modal , 3)Supervised Dictionary Learning(SupDL) litman2014supervised , and these three features without being integrated by MfML. We randomly select 60% samples with the labels from every class as the training set. In test stage, we project all features into a 30-dimensional subspace, and explore the MfML to calculate the common metric $A^{*}$ . We compare with 1).ShapeDNA, 2)BoW-WKS, 3)BoW-siHKS, 4)ShapeGoogle, 5)MFT, 6)SupDL. The test set are disjoint with the training set. The PR(precision-recall)-curves show in fig. Next experiment, the test is taken for all the dataset. We compare our MfML approach with the method in lian2011shape : FVF-WKS, BOW-LSD, LSF, SD-GDM, FOG, MDS-CM-BOF, and MeshSIFT. We evaluate the retrieval performance based on the quantitative measures from PSB shilane2004princeton : Nearest Neighbor (NN), First Tier (FT), Second Tier (ST), Emeasure (E), and Discounted Cumulative Gain (DCG) 1 . The results are averaged over 5 runs with different training set.

4.3 Experiment on SHREC’15

In this section, we conduct 2 experiments on SHREC’15 benchmark dataset also. The database contains 1200 watertight meshes, which is derived from 50 original models. Every class contains 1 null model and 23 deformed models based on it. This dataset contains all the models in SHREC’11 dataset. In every class, 20 models have the same topological structures as the original model, and topological structures of other 4 objects are modified by parts being connected, which is more challenging. We randomly select 70% samples with the labels from every class as the training set. In test stage, we use PCA to project all features into a 30-dimensional subspace, and the MfML to calculate the common metric $A^{*}$ . We compare with 1).ShapeDNA, 2)BoW-WKS, 3)BoW-siHKS, 4)ShapeGoogle, 5)MFT, 6)SupDL. The test set are disjoint with the training set. The PR-curves show in fig. Next experiment, the test is taken for all the dataset. We compare our MfML approach with the method in lian2015shrec ; lian2010shrec : HAPT, SG_L1, FVF-WKS, SID, and EDBVF_NW 2. The results are averaged over 10 runs with different training set.

4.4 Experiment Result

We can clearly find from fig and fig that MfML outperforms other methods based on LBO and the features without MfML. Specially, MfML perfectly discriminate all types of models in SHREC’11. Meanwhile, in SHREC’15 we have the best performance, in which the precision is close to 1. The comparison with the state-of-the-art methods in lian2011shape are demonstrated in table and table. The MfML outperforms other methods in SHREC’11. Meanwhile, HAPT can outperform the MfML for quantitative measures in SHREC’15. Even though FVF-WKS can achieve better performance in some quantitative measures, MfML is a better method for more datasets.

5 Conclusion

In this paper, we proposed a novel multi-feature metric learning method for non-rigid 3D shape retrieval. MfML aims to exploit compatible and complementary geometric information from multiple intrinsic features. For each feature, MfML makes the distance of inner-class pair less than a smaller threshold and that of each extra-class pair higher than a larger threshold, respectively. Meanwhile, by minimizing KL-divergence between the Gaussian distributions of different features under different distance metrics to let multiple features to work together to obtain a consensus distance metric. The two constraints are both adopted to obtain an excellent common distance metric. Many experiments on two benchmark datasets have verified that MfML is a highly efficient multi-feature distance metric learning method.

Acknowledge

This study was funded by the National Natural Science Foundation of China Grant 61370142 and Grant 61272368, by the Fundamental Research Funds for the Central Universities Grant 3132016352, by the Fundamental Research of Ministry of Transport of P.R. China Grant 2015329225300. Huibing Wang, Haohao Li and Xianping Fu declare that they have no conflict of interest. Huibing Wang and Haohao Li contribute equally to this article. This article does not contain any studies with human participants or animals performed by any of the authors.

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Lin Wu, Yang Wang, and Ling Shao. Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Transactions on Image Processing , 28(4):1602–1612, 2019.
2[2] Lin Wu, Yang Wang, Junbin Gao, and Xue Li. Deep adaptive feature embedding with local sample distributions for person re-identification. Pattern Recognition , 73:275–288, 2018.
3[3] Martin Reuter, Franz-Erich Wolter, and Niklas Peinecke. Laplace–beltrami spectra as ‘shape-dna’of surfaces and solids. Computer-Aided Design , 38(4):342–366, 2006.
4[4] Alexander M Bronstein, Michael M Bronstein, Leonidas J Guibas, and Maks Ovsjanikov. Shape google: Geometric words and expressions for invariant shape retrieval. ACM Transactions on Graphics (TOG) , 30(1):1, 2011.
5[5] Zhouhui Lian, Afzal Godil, Benjamin Bustos, Mohamed Daoudi, Jeroen Hermans, Shun Kawamura, Yukinori Kurita, Guillaume Lavoué, Hien Van Nguyen, Ryutarou Ohbuchi, et al. A comparison of methods for non-rigid 3d shape retrieval. Pattern Recognition , 46(1):449–461, 2013.
6[6] Roee Litman, Alex Bronstein, Michael Bronstein, and Umberto Castellani. Supervised learning of bag-of-features shape descriptors using sparse coding. In Computer Graphics Forum , volume 33, pages 127–136. Wiley Online Library, 2014.
7[7] Jin Xie, Guoxian Dai, Fan Zhu, Edward K Wong, and Yi Fang. Deepshape: Deep-learned shape descriptor for 3d shape retrieval. IEEE transactions on pattern analysis and machine intelligence , 39(7):1335–1345, 2017.
8[8] Lin Wu, Yang Wang, Ling Shao, and Meng Wang. 3d personvlad: Learning deep global representations for video-based person re-identification. ar Xiv preprint ar Xiv:1812.10222 , 2018.