LATFormer: Locality-Aware Point-View Fusion Transformer for 3D Shape   Recognition

Xinwei He; Silin Cheng; Dingkang Liang; Song Bai; Xi Wang; and; Yingying Zhu

arXiv:2109.01291·cs.CV·August 28, 2023·1 cites

LATFormer: Locality-Aware Point-View Fusion Transformer for 3D Shape Recognition

Xinwei He, Silin Cheng, Dingkang Liang, Song Bai, Xi Wang, and, Yingying Zhu

PDF

Open Access

TL;DR

LATFormer is a novel transformer-based model that enhances 3D shape recognition by locally fusing point cloud and multi-view image features based on co-occurrence, improving discriminative power.

Contribution

The paper introduces LATFormer, a locality-aware fusion transformer that models local feature co-occurrence for better 3D shape understanding, unlike prior global fusion methods.

Findings

01

Outperforms existing methods on 3D shape benchmarks

02

Effectively fuses multi-scale local features from point clouds and images

03

Reduces redundancy by filtering low co-occurrence scores

Abstract

Recently, 3D shape understanding has achieved significant progress due to the advances of deep learning models on various data formats like images, voxels, and point clouds. Among them, point clouds and multi-view images are two complementary modalities of 3D objects and learning representations by fusing both of them has been proven to be fairly effective. While prior works typically focus on exploiting global features of the two modalities, herein we argue that more discriminative features can be derived by modeling ``where to fuse''. To investigate this, we propose a novel Locality-Aware Point-View Fusion Transformer (LATFormer) for 3D shape retrieval and classification. The core component of LATFormer is a module named Locality-Aware Fusion (LAF) which integrates the local features of correlated regions across the two modalities based on the co-occurrence scores. We further propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · 3D Surveying and Cultural Heritage · Image Processing and 3D Reconstruction