From Perception to Symbolic Task Planning: Vision-Language Guided Human-Robot Collaborative Structured Assembly
Yanyi Chen, Min Deng

TL;DR
This paper presents a vision-language guided framework for human-robot collaborative assembly that improves robustness in state estimation and task planning amid noisy perception and human interventions.
Contribution
It introduces a novel integrated perception-to-symbolic state module and human-aware planning module for structured assembly tasks, enhancing robustness and adaptability.
Findings
PSS module achieves 97% state synthesis accuracy.
HPR module maintains feasible task progression.
Framework effectively handles diverse HRC scenarios.
Abstract
Human-robot collaboration (HRC) in structured assembly requires reliable state estimation and adaptive task planning under noisy perception and human interventions. To address these challenges, we introduce a design-grounded human-aware planning framework for human-robot collaborative structured assembly. The framework comprises two coupled modules. Module I, Perception-to-Symbolic State (PSS), employs vision-language models (VLMs) based agents to align RGB-D observations with design specifications and domain knowledge, synthesizing verifiable symbolic assembly states. It outputs validated installed and uninstalled component sets for online state tracking. Module II, Human-Aware Planning and Replanning (HPR), performs task-level multi-robot assignment and updates the plan only when the observed state deviates from the expected execution outcome. It applies a minimal-change replanning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Manufacturing Process and Optimization · AI-based Problem Solving and Planning
