TL;DR
M2Retinexformer is a multi-modal deep learning framework that enhances low-light images by integrating depth, luminance, and semantic cues through a progressive, attention-based fusion process.
Contribution
It introduces a novel multi-modal Retinex-based model that incorporates auxiliary cues and adaptive gating for improved low-light image enhancement.
Findings
Outperforms state-of-the-art methods on multiple benchmarks.
Effectively fuses multi-scale modalities via cross-attention and gating.
Achieves better noise reduction and color fidelity in low-light images.
Abstract
Low-light image enhancement is challenging due to complex degradations, including amplified noise, artifacts, and color distortion. While Retinex-based deep learning methods have achieved promising results, they primarily rely on single-modality RGB information. We propose M2Retinexformer (Multi-Modal Retinexformer), a novel framework that extends Retinexformer by incorporating depth cues, luminance priors, and semantic features within a progressive refinement pipeline. Depth provides geometric context that is invariant to lighting variations, while luminance and semantic features offer explicit guidance on brightness distribution and scene understanding. Modalities are extracted at multiple scales and fused through cross-attention, with adaptive gating dynamically balancing illumination-guided self-attention and cross-attention based on the reliability of auxiliary cues. Evaluations on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
