ThermoAct:Thermal-Aware Vision-Language-Action Models for Robotic Perception and Decision-Making
Young-Chae Son, Dae-Kwan Ko, Yoon-Ji Choi, Soo-Chul Lim

TL;DR
ThermoAct introduces a thermal-aware vision-language-action framework for robotic perception, enhancing safety and efficiency by integrating thermal data with visual and language understanding in real-world tasks.
Contribution
It is the first to incorporate thermal information into a vision-language-action model for improved robotic task execution and safety.
Findings
ThermoAct improves task success rates over visual-only systems.
ThermoAct enables robots to perceive physical properties via thermal data.
Experimental validation shows enhanced safety and robustness.
Abstract
In recent human-robot collaboration environments, there is a growing focus on integrating diverse sensor data beyond visual information to enable safer and more intelligent task execution. Although thermal data can be crucial for enhancing robot safety and operational efficiency, its integration has been relatively overlooked in prior research. This paper proposes a novel Vision-Language-Action (VLA) framework that incorporates thermal information for robot task execution. The proposed system leverages a Vision-Language Model (VLM) as a high-level planner to interpret complex natural language commands and decompose them into simpler sub-tasks. This approach facilitates efficient data collection and robust reasoning for complex operations. Unlike conventional methods that rely solely on visual data, our approach integrates thermal information, enabling the robot to perceive physical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
