Multiview Equivariance Improves 3D Correspondence Understanding with   Minimal Feature Finetuning

Yang You; Yixin Li; Congyue Deng; Yue Wang; Leonidas Guibas

arXiv:2411.19458·cs.CV·February 20, 2025

Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning

Yang You, Yixin Li, Congyue Deng, Yue Wang, Leonidas Guibas

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

This paper evaluates and improves the 3D spatial understanding of ViT-based vision models by enhancing their equivariance through minimal feature finetuning, leading to better performance on 3D tasks.

Contribution

It introduces a simple finetuning strategy based on 3D correspondences that significantly enhances 3D awareness of vision models with minimal updates.

Findings

01

Improved 3D equivariance enhances downstream task performance.

02

Finetuning on a single object yields substantial gains.

03

The proposed method is simple and effective.

Abstract

Vision foundation models, particularly the ViT family, have revolutionized image understanding by providing rich semantic features. However, despite their success in 2D comprehension, their abilities on grasping 3D spatial relationships are still unclear. In this work, we evaluate and enhance the 3D awareness of ViT-based models. We begin by systematically assessing their ability to learn 3D equivariant features, specifically examining the consistency of semantic embeddings across different viewpoints. Our findings indicate that improved 3D equivariance leads to better performance on various downstream tasks, including pose estimation, tracking, and semantic transfer. Building on this insight, we propose a simple yet effective finetuning strategy based on 3D correspondences, which significantly enhances the 3D correspondence understanding of existing vision models. Remarkably,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qq456cvb/3dcorrenhance
pytorchOfficial

Models

🤗
qq456cvb/3DCorrEnhance
model· ♡ 2
♡ 2

Videos

Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning· slideslive

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization