Adapting Large VLMs with Iterative and Manual Instructions for Generative Low-light Enhancement

Xiaoran Sun; Liyan Wang; Yeying Jin; Kin-man Lam; Zhixun Su; Yang Yang; Jinshan Pan; Cong Wang

arXiv:2507.18064·cs.CV·May 4, 2026

Adapting Large VLMs with Iterative and Manual Instructions for Generative Low-light Enhancement

Xiaoran Sun, Liyan Wang, Yeying Jin, Kin-man Lam, Zhixun Su, Yang Yang, Jinshan Pan, Cong Wang

PDF

1 Repo

TL;DR

VLM-IMI introduces a novel framework that leverages vision-language models with iterative and manual instructions to enhance low-light images semantically and realistically.

Contribution

It proposes a new cross-modal fusion approach and iterative instruction refinement for generative low-light enhancement, outperforming state-of-the-art methods.

Findings

01

Outperforms SOTA methods in perception and realism.

02

Effectively integrates semantic guidance from normal-light descriptions.

03

Supports manual instruction control for user customization.

Abstract

Most existing low-light image enhancement (LLIE) methods rely on pre-trained model priors, low-light inputs, or both, while neglecting the semantic guidance available from normal-light images. This limitation hinders their effectiveness in complex lighting conditions. In this paper, we propose VLM-IMI, a framework that adapts large vision-language models with iterative and manual instructions for generative LLIE. VLM-IMI mainly contains two branches: Normal-Light Instruction Prior Generation (NL-IPG) and Instruction-aware Light Enhancement Diffusion (IA-LED). The NL-IPG incorporates textual descriptions of the desired normal-light content as enhancement cues, enabling semantically informed restoration. IA-LED incorporates instruction priors from the NL-IPG to guide the diffusion process, enabling precise illumination enhancement. To effectively integrate cross-modal priors, we introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sunxiaoran01/VLM-IMI
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.