Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching
Bowen Wen, Shaurya Dewan, Stan Birchfield

TL;DR
Fast-FoundationStereo introduces a novel architecture that combines knowledge distillation, neural architecture search, and pruning to achieve real-time, zero-shot stereo matching with high accuracy, significantly outperforming previous methods in speed and robustness.
Contribution
The paper presents a new architecture that enables real-time, zero-shot stereo matching by integrating multiple optimization techniques and a large-scale in-the-wild dataset for training.
Findings
Runs over 10x faster than FoundationStereo
Achieves comparable zero-shot accuracy to state-of-the-art models
Establishes new benchmark for real-time stereo matching
Abstract
Stereo foundation models achieve strong zero-shot generalization but remain computationally prohibitive for real-time applications. Efficient stereo architectures, on the other hand, sacrifice robustness for speed and require costly per-domain fine-tuning. To bridge this gap, we present Fast-FoundationStereo, a family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate. We employ a divide-and-conquer acceleration strategy with three components: (1) knowledge distillation to compress the hybrid backbone into a single efficient student; (2) blockwise neural architecture search for automatically discovering optimal cost filtering designs under latency budgets, reducing search complexity exponentially; and (3) structured pruning for eliminating redundancy in the iterative refinement module. Furthermore, we introduce an automatic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis
