All-in-One: Transferring Vision Foundation Models into Stereo Matching

Jingyi Zhou; Haoyu Zhang; Jiakang Yuan; Peng Ye; Tao Chen; Hao Jiang,; Meiya Chen; Yangyang Zhang

arXiv:2412.09912·cs.CV·December 16, 2024

All-in-One: Transferring Vision Foundation Models into Stereo Matching

Jingyi Zhou, Haoyu Zhang, Jiakang Yuan, Peng Ye, Tao Chen, Hao Jiang,, Meiya Chen, Yangyang Zhang

PDF

1 Video

TL;DR

This paper introduces AIO-Stereo, a novel method that transfers knowledge from multiple vision foundation models into a stereo matching system, achieving state-of-the-art results across various datasets.

Contribution

It proposes a dual-level feature utilization mechanism and a selective knowledge transfer module to effectively incorporate heterogeneous VFMs into stereo matching.

Findings

01

Achieves top performance on Middlebury dataset

02

Ranks 1st on ETH3D benchmark

03

Outperforms previous methods on multiple datasets

Abstract

As a fundamental vision task, stereo matching has made remarkable progress. While recent iterative optimization-based methods have achieved promising performance, their feature extraction capabilities still have room for improvement. Inspired by the ability of vision foundation models (VFMs) to extract general representations, in this work, we propose AIO-Stereo which can flexibly select and transfer knowledge from multiple heterogeneous VFMs to a single stereo matching model. To better reconcile features between heterogeneous VFMs and the stereo matching model and fully exploit prior knowledge from VFMs, we proposed a dual-level feature utilization mechanism that aligns heterogeneous features and transfers multi-level knowledge. Based on the mechanism, a dual-level selective knowledge transfer module is designed to selectively transfer knowledge and integrate the advantages of multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

All-in-One: Transferring Vision Foundation Models into Stereo Matching· underline