Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for   Robot Navigation Tasks

Xingpeng Sun; Haoming Meng; Souradip Chakraborty; Amrit Singh Bedi,; Aniket Bera

arXiv:2402.03494·cs.AI·November 12, 2024·1 cites

Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks

Xingpeng Sun, Haoming Meng, Souradip Chakraborty, Amrit Singh Bedi,, Aniket Bera

PDF

Open Access

TL;DR

This paper introduces 'Beyond Text,' a method that enhances LLM decision-making in robot navigation by integrating vocal cues and audio features, leading to improved accuracy and robustness in human-robot interactions.

Contribution

The paper presents a novel approach that combines audio transcription with paralinguistic features to improve LLM performance in social navigation tasks.

Findings

01

Achieves a 70.26% winning rate, outperforming existing LLMs by up to 48.30%.

02

Enhances robustness against token manipulation attacks, with a 22.44% smaller decrease in success rate.

03

Advances human-robot interaction by integrating audio cues with text-based guidance.

Abstract

While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present Beyond Text: an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 22.16% to 48.30% (gemini-1.5-pro and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Robotics and Automated Systems

MethodsFocus