TL;DR
DarkQA is a benchmark designed to evaluate vision-language models' robustness in low-light indoor scenes, highlighting their limitations under challenging visual conditions.
Contribution
It introduces a physically realistic, open-source benchmark with 9.4K question-image pairs for assessing perceptual primitives in low-light environments, validated against real camera data.
Findings
VLMs degrade under low-light and sensor noise conditions.
LLIE methods provide inconsistent recovery across severity levels.
DarkQA reveals systematic limitations of current VLMs in low-light scenarios.
Abstract
Vision Language Models (VLMs) are increasingly adopted as central reasoning modules for embodied agents. Existing benchmarks evaluate their capabilities under ideal, well-lit conditions, yet robust 24/7 operation demands performance under a wide range of visual degradations, including low-light conditions at night or in dark environments, a core necessity that has been largely overlooked. To address this underexplored challenge, we present DarkQA, an open-source benchmark for evaluating perceptual primitives under multi-level low-light conditions in embodied scenarios. DarkQA evaluates single-view egocentric observations across controlled degradation levels, isolating low-light perceptual failures before they are entangled with complex embodied tasks. The benchmark contains 9.4K deterministically generated and verifiable question-image pairs spanning five visual-primitive families. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
