Stereo Anything: Unifying Zero-shot Stereo Matching with Large-Scale Mixed Data
Xianda Guo, Chenming Zhang, Youmin Zhang, Ruilin Wang, Dujun Nie, Wenzhao Zheng, Matteo Poggi, Hao Zhao, Mang Ye, Qin Zou, Long Chen

TL;DR
Stereo Anything introduces a scalable, data-centric approach that unifies diverse stereo datasets and synthetic data to significantly improve zero-shot generalization in stereo matching models across various benchmarks.
Contribution
It presents a novel mixed-data training framework that enhances zero-shot stereo matching without changing model architecture.
Findings
Achieves state-of-the-art zero-shot generalization on four benchmarks.
Effectively mitigates dataset bias through mixed-data training.
Demonstrates scalability to any stereo image pair.
Abstract
Stereo matching serves as a cornerstone in 3D vision, aiming to establish pixel-wise correspondences between stereo image pairs for depth recovery. Despite remarkable progress driven by deep neural architectures, current models often exhibit severe performance degradation when deployed in unseen domains, primarily due to the limited diversity of training data. In this work, we introduce StereoAnything, a data-centric framework that substantially enhances the zero-shot generalization capability of existing stereo models. Rather than devising yet another specialized architecture, we scale stereo training to an unprecedented level by systematically unifying heterogeneous stereo sources: (1) curated labeled datasets covering diverse environments, and (2) large-scale synthetic stereo pairs generated from unlabeled monocular images. Our mixed-data strategy delivers consistent and robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Satellite Image Processing and Photogrammetry
