A Human-in-the-loop Approach to Robot Action Replanning through LLM Common-Sense Reasoning
Elena Merlo, Marta Lagomarsino, Arash Ajoudani

TL;DR
This paper introduces a human-in-the-loop system that uses LLMs and visual input to improve robot action plans from a single demonstration, enabling intuitive corrections and adaptations for better robustness.
Contribution
It presents a novel approach combining LLM common-sense reasoning with visual data to automatically refine robot plans based on user input and critical task aspects.
Findings
System effectively corrects vision-based errors
Enables plan adaptation without extra demonstrations
Improves robustness through interactive refinements
Abstract
To facilitate the wider adoption of robotics, accessible programming tools are required for non-experts. Observational learning enables intuitive human skills transfer through hands-on demonstrations, but relying solely on visual input can be inefficient in terms of scalability and failure mitigation, especially when based on a single demonstration. This paper presents a human-in-the-loop method for enhancing the robot execution plan, automatically generated based on a single RGB video, with natural language input to a Large Language Model (LLM). By including user-specified goals or critical task aspects and exploiting the LLM common-sense reasoning, the system adjusts the vision-based plan to prevent potential failures and adapts it based on the received instructions. Experiments demonstrated the framework intuitiveness and effectiveness in correcting vision-derived errors and adapting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
