LIT: Large Language Model Driven Intention Tracking for Proactive Human-Robot Collaboration -- A Robot Sous-Chef Application
Zhe Huang, John Pohovey, Ananya Yammanuru, Katherine Driggs-Campbell

TL;DR
This paper introduces LIT, a method using large language and vision models to predict human intentions in long-term collaborative tasks, enabling proactive robot assistance in cooking scenarios.
Contribution
The paper presents a novel intention tracking approach that leverages LLMs and VLMs for proactive human-robot collaboration in long-horizon tasks.
Findings
Effective prediction of human intentions in collaborative tasks
Smooth coordination demonstrated in cooking scenarios
Proactive robot assistance improves collaboration efficiency
Abstract
Large Language Models (LLM) and Vision Language Models (VLM) enable robots to ground natural language prompts into control actions to achieve tasks in an open world. However, when applied to a long-horizon collaborative task, this formulation results in excessive prompting for initiating or clarifying robot actions at every step of the task. We propose Language-driven Intention Tracking (LIT), leveraging LLMs and VLMs to model the human user's long-term behavior and to predict the next human intention to guide the robot for proactive collaboration. We demonstrate smooth coordination between a LIT-based collaborative robot and the human user in collaborative cooking tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems · Topic Modeling · AI in Service Interactions
