Gaze-supported Large Language Model Framework for Bi-directional Human-Robot Interaction

Jens V. R\"uppel; Andrey Rudenko; Tim Schreiter; Martin Magnusson; Achim J. Lilienthal

arXiv:2507.15729·cs.RO·July 22, 2025

Gaze-supported Large Language Model Framework for Bi-directional Human-Robot Interaction

Jens V. R\"uppel, Andrey Rudenko, Tim Schreiter, Martin Magnusson, Achim J. Lilienthal

PDF

TL;DR

This paper introduces a gaze- and speech-informed LLM-based framework for bi-directional human-robot interaction, enhancing adaptability and user engagement in assistive robotics through real-time perception and modular design.

Contribution

It presents a novel modular, gaze- and speech-supported LLM framework for bi-directional HRI, capable of real-time perception and adaptable to diverse tasks and robots.

Findings

01

LLM-based system improves adaptability and user engagement.

02

System marginally enhances task execution metrics.

03

Scripted pipeline remains effective for simple tasks.

Abstract

The rapid development of Large Language Models (LLMs) creates an exciting potential for flexible, general knowledge-driven Human-Robot Interaction (HRI) systems for assistive robots. Existing HRI systems demonstrate great progress in interpreting and following user instructions, action generation, and robot task solving. On the other hand, bi-directional, multi-modal, and context-aware support of the user in collaborative tasks still remains an open challenge. In this paper, we present a gaze- and speech-informed interface to the assistive robot, which is able to perceive the working environment from multiple vision inputs and support the dynamic user in their tasks. Our system is designed to be modular and transferable to adapt to diverse tasks and robots, and it is capable of real-time use of language-based interaction state representation and fast on board perception modules. Its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.