Enlighten-Your-Voice: When Multimodal Meets Zero-shot Low-light Image Enhancement
Xiaofeng Zhang, Zishan Xu, Hao Tang, Chaochen Gu, Wei Chen, Shanying, Zhu, Xinping Guan

TL;DR
Enlighten-Your-Voice is a multimodal framework that uses voice and text commands to enhance low-light images, featuring novel modules for detailed content and color adjustments, and demonstrating strong zero-shot generalization.
Contribution
The paper introduces a multimodal low-light image enhancement framework with dual attention and semantic fusion modules, enabling unsupervised zero-shot performance and interactive user control.
Findings
Effective low-light enhancement with multimodal interaction
Strong zero-shot generalization capabilities
Improved detail and color fidelity in enhanced images
Abstract
Low-light image enhancement is a crucial visual task, and many unsupervised methods tend to overlook the degradation of visible information in low-light scenes, which adversely affects the fusion of complementary information and hinders the generation of satisfactory results. To address this, our study introduces "Enlighten-Your-Voice", a multimodal enhancement framework that innovatively enriches user interaction through voice and textual commands. This approach does not merely signify a technical leap but also represents a paradigm shift in user engagement. Our model is equipped with a Dual Collaborative Attention Module (DCAM) that meticulously caters to distinct content and color discrepancies, thereby facilitating nuanced enhancements. Complementarily, we introduce a Semantic Feature Fusion (SFM) plug-and-play module that synergizes semantic context with low-light enhancement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Visual Attention and Saliency Detection · Image and Video Quality Assessment
