Attentive Multimodal Fusion for Optical and Scene Flow
Youjie Zhou, Guofeng Mei, Yiming Wang, Fabio Poiesi, Yi Wan

TL;DR
This paper introduces FusionRAFT, a deep neural network that fuses RGB and depth data early in the process using attention mechanisms, improving optical and scene flow estimation especially in noisy or dark environments.
Contribution
The novel FusionRAFT model employs early-stage multimodal fusion with self- and cross-attention layers, enhancing robustness and accuracy over existing late-fusion methods.
Findings
Outperforms recent methods on Flyingthings3D dataset
Generalizes well to real-world KITTI dataset
Shows improved robustness under noisy and low-light conditions
Abstract
This paper presents an investigation into the estimation of optical and scene flow using RGBD information in scenarios where the RGB modality is affected by noise or captured in dark environments. Existing methods typically rely solely on RGB images or fuse the modalities at later stages, which can result in lower accuracy when the RGB information is unreliable. To address this issue, we propose a novel deep neural network approach named FusionRAFT, which enables early-stage information fusion between sensor modalities (RGB and depth). Our approach incorporates self- and cross-attention layers at different network levels to construct informative features that leverage the strengths of both modalities. Through comparative experiments, we demonstrate that our approach outperforms recent methods in terms of performance on the synthetic dataset Flyingthings3D, as well as the generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Enhancement Techniques · Advanced Optical Sensing Technologies
