ModalFormer: Multimodal Transformer for Low-Light Image Enhancement

Alexandru Brateanu; Raul Balmez; Ciprian Orhei; Codruta Ancuti; Cosmin Ancuti

arXiv:2507.20388·cs.CV·July 29, 2025

ModalFormer: Multimodal Transformer for Low-Light Image Enhancement

Alexandru Brateanu, Raul Balmez, Ciprian Orhei, Codruta Ancuti, Cosmin Ancuti

PDF

1 Models

TL;DR

ModalFormer is a novel multimodal transformer framework that leverages nine auxiliary modalities to significantly improve low-light image enhancement, outperforming existing methods on benchmark datasets.

Contribution

Introduces ModalFormer, the first large-scale multimodal framework for LLIE that integrates nine auxiliary modalities using a novel cross-modal self-attention mechanism.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Effectively fuses diverse modalities for enhanced image restoration.

03

Demonstrates robustness across various low-light conditions.

Abstract

Low-light image enhancement (LLIE) is a fundamental yet challenging task due to the presence of noise, loss of detail, and poor contrast in images captured under insufficient lighting conditions. Recent methods often rely solely on pixel-level transformations of RGB images, neglecting the rich contextual information available from multiple visual modalities. In this paper, we present ModalFormer, the first large-scale multimodal framework for LLIE that fully exploits nine auxiliary modalities to achieve state-of-the-art performance. Our model comprises two main components: a Cross-modal Transformer (CM-T) designed to restore corrupted images while seamlessly integrating multimodal information, and multiple auxiliary subnetworks dedicated to multimodal feature reconstruction. Central to the CM-T is our novel Cross-modal Multi-headed Self-Attention mechanism (CM-MSA), which effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
albrateanu/ModalFormer
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.