MAFNet:Multi-frequency Adaptive Fusion Network for Real-time Stereo Matching

Ao Xu; Rujin Zhao; Xiong Xu; Boceng Huang; Yujia Jia; Hongfeng Long; Fuxuan Chen; Zilong Cao; Fangyuan Chen

arXiv:2512.04358·cs.CV·January 9, 2026

MAFNet:Multi-frequency Adaptive Fusion Network for Real-time Stereo Matching

Ao Xu, Rujin Zhao, Xiong Xu, Boceng Huang, Yujia Jia, Hongfeng Long, Fuxuan Chen, Zilong Cao, Fangyuan Chen

PDF

Open Access

TL;DR

MAFNet is a novel stereo matching network that efficiently combines frequency-domain filtering and low-rank attention to produce high-quality disparity maps in real-time on resource-limited devices.

Contribution

The paper introduces a multi-frequency adaptive fusion approach using frequency-aware filtering and low-rank attention, enabling real-time stereo matching with high accuracy on mobile devices.

Findings

01

Outperforms existing real-time stereo methods on Scene Flow and KITTI 2015 datasets.

02

Achieves a good balance between accuracy and computational efficiency.

03

Utilizes only 2D convolutions for high-quality disparity estimation.

Abstract

Existing stereo matching networks typically rely on either cost-volume construction based on 3D convolutions or deformation methods based on iterative optimization. The former incurs significant computational overhead during cost aggregation, whereas the latter often lacks the ability to model non-local contextual information. These methods exhibit poor compatibility on resource-constrained mobile devices, limiting their deployment in real-time applications. To address this, we propose a Multi-frequency Adaptive Fusion Network (MAFNet), which can produce high-quality disparity maps using only efficient 2D convolutions. Specifically, we design an adaptive frequency-domain filtering attention module that decomposes the full cost volume into high-frequency and low-frequency volumes, performing frequency-aware feature aggregation separately. Subsequently, we introduce a Linformer-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Speech and Audio Processing · Music and Audio Processing