SMFormer: Empowering Self-supervised Stereo Matching via Foundation Models and Data Augmentation
Yun Wang, Zhengjie Yang, Jiahao Zheng, Zhanjie Zhang, Dapeng Oliver Wu, Yulan Guo

TL;DR
SMFormer is a self-supervised stereo matching framework that leverages foundation models and data augmentation to improve robustness and achieve state-of-the-art results, even surpassing some supervised methods.
Contribution
It introduces a novel integration of Vision Foundation Models with feature pyramids and a data augmentation mechanism for enhanced self-supervised stereo matching.
Findings
Achieves state-of-the-art performance among self-supervised methods.
Outperforms some supervised methods on the Booster benchmark.
Provides robust feature representations against real-world disturbances.
Abstract
Recent self-supervised stereo matching methods have made significant progress. They typically rely on the photometric consistency assumption, which presumes corresponding points across views share the same appearance. However, this assumption could be compromised by real-world disturbances, resulting in invalid supervisory signals and a significant accuracy gap compared to supervised methods. To address this issue, we propose SMFormer, a framework integrating more reliable self-supervision guided by the Vision Foundation Model (VFM) and data augmentation. We first incorporate the VFM with the Feature Pyramid Network (FPN), providing a discriminative and robust feature representation against disturbance in various scenarios. We then devise an effective data augmentation mechanism that ensures robustness to various transformations. The data augmentation mechanism explicitly enforces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
