TL;DR
MAVFusion is an efficient end-to-end video fusion framework that uses motion-aware sparse interactions to combine infrared and visible videos, improving speed and quality by focusing on dynamic regions.
Contribution
It introduces a novel motion-aware sparse interaction mechanism that adaptively allocates attention to dynamic regions, significantly enhancing efficiency and fusion quality.
Findings
Achieves state-of-the-art performance on multiple benchmarks.
Runs at 14.16 FPS at 640x480 resolution.
Effectively preserves temporal consistency and details.
Abstract
Infrared and visible video fusion combines the object saliency from infrared images with the texture details from visible images to produce semantically rich fusion results. However, most existing methods are designed for static image fusion and cannot effectively handle frame-to-frame motion in videos. Current video fusion methods improve temporal consistency by introducing interactions across frames, but they often require high computational cost. To mitigate these challenges, we propose MAVFusion, an end-to-end video fusion framework featuring a motion-aware sparse interaction mechanism that enhances efficiency while maintaining superior fusion quality. Specifically, we leverage optical flow to identify dynamic regions in multi-modal sequences, adaptively allocating computationally intensive cross-modal attention to these sparse areas to capture salient transitions and facilitate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
