TL;DR
EmambaIR introduces an efficient state space model with sparse attention and gated modules for event-guided image reconstruction, outperforming existing methods in accuracy and efficiency.
Contribution
The paper proposes a novel framework combining sparse attention and gated state-space modules to improve event-based image reconstruction efficiency and effectiveness.
Findings
Outperforms state-of-the-art methods across six datasets
Reduces memory and computational costs significantly
Effective in motion deblurring, deraining, and HDR enhancement
Abstract
Recent event-based image reconstruction methods predominantly rely on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to process complementary event information. However, these architectures face fundamental limitations: CNNs often fail to capture global feature correlations, whereas ViTs incur quadratic computational complexity (e.g., ), hindering their application in high-resolution scenarios. To address these bottlenecks, we introduce EmambaIR, an Efficient visual State Space Model designed for image reconstruction using spatially sparse and temporally continuous event streams. Our framework introduces two key components: the cross-modal Top-k Sparse Attention Module (TSAM) and the Gated State-Space Module (GSSM). TSAM efficiently performs pixel-level top-k sparse attention to guide cross-modal interactions, yielding rich yet sparse fusion features.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
