Lite Any Stereo: Efficient Zero-Shot Stereo Matching
Junpeng Jing, Weixun Luo, Ye Mao, Krystian Mikolajczyk

TL;DR
Lite Any Stereo introduces an ultra-light, efficient stereo matching framework that achieves strong zero-shot generalization and state-of-the-art accuracy with minimal computational cost.
Contribution
The paper presents a compact, expressive backbone and a hybrid cost aggregation module, enabling zero-shot generalization in an ultra-light stereo matching model.
Findings
Achieves 1st place on four real-world benchmarks.
Requires less than 1% of the computational cost of state-of-the-art methods.
Maintains high accuracy comparable to or exceeding larger models.
Abstract
Recent advances in stereo matching have focused on accuracy, often at the cost of significantly increased model size. Traditionally, the community has regarded efficient models as incapable of zero-shot ability due to their limited capacity. In this paper, we introduce Lite Any Stereo, a stereo depth estimation framework that achieves strong zero-shot generalization while remaining highly efficient. To this end, we design a compact yet expressive backbone to ensure scalability, along with a carefully crafted hybrid cost aggregation module. We further propose a three-stage training strategy on million-scale data to effectively bridge the sim-to-real gap. Together, these components demonstrate that an ultra-light model can deliver strong generalization, ranking 1st across four widely used real-world benchmarks. Remarkably, our model attains accuracy comparable to or exceeding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
