TL;DR
This paper introduces SMoEStereo, a novel framework that enhances stereo matching robustness across domains by adaptively integrating Vision Foundation Models with mixture-of-experts and low-rank adaptation techniques, balancing accuracy and efficiency.
Contribution
The paper presents a scene-specific fusion approach using MoE and LoRA modules to improve cross-domain stereo matching robustness, with a lightweight decision network for efficiency.
Findings
Achieves state-of-the-art cross-domain performance on multiple benchmarks.
Effectively balances accuracy and computational efficiency.
Demonstrates robustness without dataset-specific tuning.
Abstract
Recently, learning-based stereo matching networks have advanced significantly. However, they often lack robustness and struggle to achieve impressive cross-domain performance due to domain shifts and imbalanced disparity distributions among diverse datasets. Leveraging Vision Foundation Models (VFMs) can intuitively enhance the model's robustness, but integrating such a model into stereo matching cost-effectively to fully realize their robustness remains a key challenge. To address this, we propose SMoEStereo, a novel framework that adapts VFMs for stereo matching through a tailored, scene-specific fusion of Low-Rank Adaptation (LoRA) and Mixture-of-Experts (MoE) modules. SMoEStereo introduces MoE-LoRA with adaptive ranks and MoE-Adapter with adaptive kernel sizes. The former dynamically selects optimal experts within MoE to adapt varying scenes across domains, while the latter injects…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMixture of Experts
