ThermoAct:Thermal-Aware Vision-Language-Action Models for Robotic Perception and Decision-Making

Young-Chae Son; Dae-Kwan Ko; Yoon-Ji Choi; Soo-Chul Lim

arXiv:2603.25044·cs.RO·April 10, 2026

ThermoAct:Thermal-Aware Vision-Language-Action Models for Robotic Perception and Decision-Making

Young-Chae Son, Dae-Kwan Ko, Yoon-Ji Choi, Soo-Chul Lim

PDF

TL;DR

ThermoAct introduces a thermal-aware vision-language-action framework for robotic perception, enhancing safety and efficiency by integrating thermal data with visual and language understanding in real-world tasks.

Contribution

It is the first to incorporate thermal information into a vision-language-action model for improved robotic task execution and safety.

Findings

01

ThermoAct improves task success rates over visual-only systems.

02

ThermoAct enables robots to perceive physical properties via thermal data.

03

Experimental validation shows enhanced safety and robustness.

Abstract

In recent human-robot collaboration environments, there is a growing focus on integrating diverse sensor data beyond visual information to enable safer and more intelligent task execution. Although thermal data can be crucial for enhancing robot safety and operational efficiency, its integration has been relatively overlooked in prior research. This paper proposes a novel Vision-Language-Action (VLA) framework that incorporates thermal information for robot task execution. The proposed system leverages a Vision-Language Model (VLM) as a high-level planner to interpret complex natural language commands and decompose them into simpler sub-tasks. This approach facilitates efficient data collection and robust reasoning for complex operations. Unlike conventional methods that rely solely on visual data, our approach integrates thermal information, enabling the robot to perceive physical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.