Replanning Human-Robot Collaborative Tasks with Vision-Language Models via Semantic and Physical Dual-Correction

Taichi Kato; Takuya Kiyokawa; Namiko Saito; and Kensuke Harada

arXiv:2602.14551·cs.RO·February 17, 2026

Replanning Human-Robot Collaborative Tasks with Vision-Language Models via Semantic and Physical Dual-Correction

Taichi Kato, Takuya Kiyokawa, Namiko Saito, and Kensuke Harada

PDF

Open Access

TL;DR

This paper introduces a dual-correction framework for human-robot collaboration that enhances vision-language model reasoning with internal and external checks, improving task success and robustness in assembly tasks.

Contribution

It presents a novel dual-correction mechanism integrating internal logical verification and external failure rectification to improve VLM-based HRC performance.

Findings

01

Improved success rate in simulation studies.

02

Effective real-world replanning in assembly tasks.

03

Enhanced robustness against physical failures.

Abstract

Human-Robot Collaboration (HRC) plays an important role in assembly tasks by enabling robots to plan and adjust their motions based on interactive, real-time human instructions. However, such instructions are often linguistically ambiguous and underspecified, making it difficult to generate physically feasible and cooperative robot behaviors. To address this challenge, many studies have applied Vision-Language Models (VLMs) to interpret high-level instructions and generate corresponding actions. Nevertheless, VLM-based approaches still suffer from hallucinated reasoning and an inability to anticipate physical execution failures. To address these challenges, we propose an HRC framework that augments a VLM-based reasoning with a dual-correction mechanism: an internal correction model that verifies logical consistency and task feasibility prior to action execution, and an external…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Social Robot Interaction and HRI · Multimodal Machine Learning Applications