Safe-Night VLA: Seeing the Unseen via Thermal-Perceptive Vision-Language-Action Models for Safety-Critical Manipulation

Dian Yu; Qingchuan Zhou; Bingkun Huang; Majid Khadiv; Zewen Yang

arXiv:2603.05754·cs.RO·March 9, 2026

Safe-Night VLA: Seeing the Unseen via Thermal-Perceptive Vision-Language-Action Models for Safety-Critical Manipulation

Dian Yu, Qingchuan Zhou, Bingkun Huang, Majid Khadiv, Zewen Yang

PDF

Open Access

TL;DR

Safe-Night VLA introduces a multimodal manipulation framework that integrates thermal perception with vision-language models and safety constraints, enabling robots to perceive unseen thermal signals and operate safely in unstructured environments.

Contribution

It is the first to incorporate thermal perception into vision-language-action models for robotic manipulation with explicit safety constraints.

Findings

01

Outperforms RGB-only baselines in thermal-aware tasks

02

Enables temperature-conditioned manipulation and subsurface target localization

03

Maintains safety through control barrier functions during execution

Abstract

Current Vision-Language-Action (VLA) models rely primarily on RGB perception, preventing them from capturing modalities such as thermal signals that are imperceptible to conventional visual sensors. Moreover, end-to-end generative policies lack explicit safety constraints, making them fragile when encountering obstacles and novel scenarios outside the training distribution. To address these limitations, we propose Safe-Night VLA, a multimodal manipulation framework that enables robots to see the unseen while enforcing rigorous safety constraints for thermal-aware manipulation in unstructured environments. Specifically, Safe-Night VLA integrates long-wave infrared thermal perception into a pre-trained vision-language backbone, enabling semantic reasoning grounded in thermodynamic properties. To ensure safe execution under out-of-distribution conditions, we incorporate a safety filter via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning