Gaze-supported Large Language Model Framework for Bi-directional Human-Robot Interaction
Jens V. R\"uppel, Andrey Rudenko, Tim Schreiter, Martin Magnusson, Achim J. Lilienthal

TL;DR
This paper introduces a gaze- and speech-informed LLM-based framework for bi-directional human-robot interaction, enhancing adaptability and user engagement in assistive robotics through real-time perception and modular design.
Contribution
It presents a novel modular, gaze- and speech-supported LLM framework for bi-directional HRI, capable of real-time perception and adaptable to diverse tasks and robots.
Findings
LLM-based system improves adaptability and user engagement.
System marginally enhances task execution metrics.
Scripted pipeline remains effective for simple tasks.
Abstract
The rapid development of Large Language Models (LLMs) creates an exciting potential for flexible, general knowledge-driven Human-Robot Interaction (HRI) systems for assistive robots. Existing HRI systems demonstrate great progress in interpreting and following user instructions, action generation, and robot task solving. On the other hand, bi-directional, multi-modal, and context-aware support of the user in collaborative tasks still remains an open challenge. In this paper, we present a gaze- and speech-informed interface to the assistive robot, which is able to perceive the working environment from multiple vision inputs and support the dynamic user in their tasks. Our system is designed to be modular and transferable to adapt to diverse tasks and robots, and it is capable of real-time use of language-based interaction state representation and fast on board perception modules. Its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
