Transformed Multi-view 3D Shape Features with Contrastive Learning

M\'arcus Vin\'icius Lobo Costa; Sherlon Almeida da Silva; B\'arbara Caroline Benato; Leo Sampaio Ferraz Ribeiro; Moacir Antonelli Ponti

arXiv:2510.19955·cs.CV·October 24, 2025

Transformed Multi-view 3D Shape Features with Contrastive Learning

M\'arcus Vin\'icius Lobo Costa, Sherlon Almeida da Silva, B\'arbara Caroline Benato, Leo Sampaio Ferraz Ribeiro, Moacir Antonelli Ponti

PDF

Open Access

TL;DR

This paper explores the use of Vision Transformers combined with contrastive learning techniques to improve 3D shape feature representation, achieving high accuracy with less labeled data in multi-view 3D analysis.

Contribution

It introduces a novel approach that integrates ViTs with contrastive learning for 3D shape understanding, demonstrating superior performance over traditional CNN-based methods.

Findings

01

Achieved 90.6% accuracy on ModelNet10.

02

ViTs effectively capture global shape semantics.

03

Contrastive learning refines local discriminative features.

Abstract

This paper addresses the challenges in representation learning of 3D shape features by investigating state-of-the-art backbones paired with both contrastive supervised and self-supervised learning objectives. Computer vision methods struggle with recognizing 3D objects from 2D images, often requiring extensive labeled data and relying on Convolutional Neural Networks (CNNs) that may overlook crucial shape relationships. Our work demonstrates that Vision Transformers (ViTs) based architectures, when paired with modern contrastive objectives, achieve promising results in multi-view 3D analysis on our downstream tasks, unifying contrastive and 3D shape understanding pipelines. For example, supervised contrastive losses reached about 90.6% accuracy on ModelNet10. The use of ViTs and contrastive learning, leveraging ViTs' ability to understand overall shapes and contrastive learning's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization · Advanced Vision and Imaging