SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and   Multi-View for 3D Object Retrieval

Dongyun Lin; Yi Cheng; Aiyuan Guo; Shangbo Mao; Yiqun Li

arXiv:2307.10601·cs.CV·December 1, 2023

SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and Multi-View for 3D Object Retrieval

Dongyun Lin, Yi Cheng, Aiyuan Guo, Shangbo Mao, Yiqun Li

PDF

Open Access

TL;DR

This paper introduces SCA-PVNet, a novel multi-modality feature aggregation method using self- and cross-attention mechanisms to improve 3D object retrieval performance across various datasets.

Contribution

It proposes a new deep learning framework that effectively fuses point cloud and multi-view image features using attention modules for enhanced retrieval accuracy.

Findings

01

Outperforms state-of-the-art methods on three diverse datasets.

02

Effectively combines multi-view and point cloud features for better discrimination.

03

Demonstrates robustness across small to large-scale datasets.

Abstract

To address 3D object retrieval, substantial efforts have been made to generate highly discriminative descriptors of 3D objects represented by a single modality, e.g., voxels, point clouds or multi-view images. It is promising to leverage the complementary information from multi-modality representations of 3D objects to further improve retrieval performance. However, multi-modality 3D object retrieval is rarely developed and analyzed on large-scale datasets. In this paper, we propose self-and-cross attention based aggregation of point cloud and multi-view images (SCA-PVNet) for 3D object retrieval. With deep features extracted from point clouds and multi-view images, we design two types of feature aggregation modules, namely the In-Modality Aggregation Module (IMAM) and the Cross-Modality Aggregation Module (CMAM), for effective feature fusion. IMAM leverages a self-attention mechanism…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization · Image Processing and 3D Reconstruction