Learning to See through Illumination Extremes with Event Streaming in Multimodal Large Language Models

Baoheng Zhang; Jiahui Liu; Gui Zhao; Weizhou Zhang; Yixuan Ma; Jun Jiang; Yingxian Chen; Wilton W.T. Fok; Xiaojuan Qi; Hayden Kwok-Hay So

arXiv:2603.27558·cs.CV·March 31, 2026

Learning to See through Illumination Extremes with Event Streaming in Multimodal Large Language Models

Baoheng Zhang, Jiahui Liu, Gui Zhao, Weizhou Zhang, Yixuan Ma, Jun Jiang, Yingxian Chen, Wilton W.T. Fok, Xiaojuan Qi, Hayden Kwok-Hay So

PDF

TL;DR

Event-MLLM enhances multimodal large language models to perform vision-language reasoning under extreme illumination by fusing event streams with RGB inputs and using adaptive illumination-aware components.

Contribution

The paper introduces a novel event-enhanced multimodal model with adaptive fusion and illumination correction, along with a new multi-illumination event-instruction dataset.

Findings

01

Event-MLLM outperforms existing models in extreme lighting conditions.

02

The model achieves state-of-the-art results in robust perception and reasoning.

03

A new dataset with 2,241 samples across 17 brightness levels supports evaluation.

Abstract

Multimodal Large Language Models (MLLMs) perform strong vision-language reasoning under standard conditions but fail in extreme illumination, where RGB inputs lose irrevocable structure and semantics. We propose Event-MLLM, an event-enhanced model that performs all-light visual reasoning by dynamically fusing event streams with RGB frames. Two key components drive our approach: an Illumination Indicator - a learnable signal derived from a DINOv2 branch that represents exposure degradation and adaptively modulates event-RGB fusion - and an Illumination Correction Loss that aligns fused features with non-degraded (normal-light) semantics in the latent space, compensating for information lost in extreme lighting. We curate the first multi-illumination event-instruction corpus for MLLMs, with 2,241 event-RGB samples (around 6 QA pairs each) across diverse scenes and 17 brightness rates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.