Enlighten-Your-Voice: When Multimodal Meets Zero-shot Low-light Image   Enhancement

Xiaofeng Zhang; Zishan Xu; Hao Tang; Chaochen Gu; Wei Chen; Shanying; Zhu; Xinping Guan

arXiv:2312.10109·cs.CV·February 5, 2024·2 cites

Enlighten-Your-Voice: When Multimodal Meets Zero-shot Low-light Image Enhancement

Xiaofeng Zhang, Zishan Xu, Hao Tang, Chaochen Gu, Wei Chen, Shanying, Zhu, Xinping Guan

PDF

Open Access

TL;DR

Enlighten-Your-Voice is a multimodal framework that uses voice and text commands to enhance low-light images, featuring novel modules for detailed content and color adjustments, and demonstrating strong zero-shot generalization.

Contribution

The paper introduces a multimodal low-light image enhancement framework with dual attention and semantic fusion modules, enabling unsupervised zero-shot performance and interactive user control.

Findings

01

Effective low-light enhancement with multimodal interaction

02

Strong zero-shot generalization capabilities

03

Improved detail and color fidelity in enhanced images

Abstract

Low-light image enhancement is a crucial visual task, and many unsupervised methods tend to overlook the degradation of visible information in low-light scenes, which adversely affects the fusion of complementary information and hinders the generation of satisfactory results. To address this, our study introduces "Enlighten-Your-Voice", a multimodal enhancement framework that innovatively enriches user interaction through voice and textual commands. This approach does not merely signify a technical leap but also represents a paradigm shift in user engagement. Our model is equipped with a Dual Collaborative Attention Module (DCAM) that meticulously caters to distinct content and color discrepancies, thereby facilitating nuanced enhancements. Complementarily, we introduce a Semantic Feature Fusion (SFM) plug-and-play module that synergizes semantic context with low-light enhancement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Enhancement Techniques · Visual Attention and Saliency Detection · Image and Video Quality Assessment