Bidirectional Intent Communication: A Role for Large Foundation Models
Tim Schreiter, Rishi Hazra, Jens R\"uppel, Andrey Rudenko

TL;DR
This paper presents Bident, a framework that enables robots to engage in bidirectional, multimodal interactions with humans, enhancing assistive applications like education and healthcare through seamless integration and personalized communication.
Contribution
Bident introduces a novel multimodal, bidirectional interaction framework for robots, emphasizing human-robot cooperation in shared spaces with speech, gaze, gestures, and actions.
Findings
Supports verbal and physical interactions
Enhances human-robot cooperation in shared environments
Potential applications in education and healthcare
Abstract
Integrating multimodal foundation models has significantly enhanced autonomous agents' language comprehension, perception, and planning capabilities. However, while existing works adopt a \emph{task-centric} approach with minimal human interaction, applying these models to developing assistive \emph{user-centric} robots that can interact and cooperate with humans remains underexplored. This paper introduces ``Bident'', a framework designed to integrate robots seamlessly into shared spaces with humans. Bident enhances the interactive experience by incorporating multimodal inputs like speech and user gaze dynamics. Furthermore, Bident supports verbal utterances and physical actions like gestures, making it versatile for bidirectional human-robot interactions. Potential applications include personalized education, where robots can adapt to individual learning styles and paces, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
