Are Large Language Models Aligned with People's Social Intuitions for Human-Robot Interactions?
Lennart Wachowiak, Andrew Coles, Oya Celiktutan, Gerard Canal

TL;DR
This study evaluates whether large language models, especially GPT-4, can accurately reflect human social preferences and judgments in human-robot interaction scenarios, highlighting their strengths and limitations.
Contribution
The paper demonstrates GPT-4's strong alignment with human social judgments in HRI, and assesses the capabilities and limitations of LLMs and vision models in social robotics contexts.
Findings
GPT-4 correlates strongly with human judgments in communication and behavior desirability.
Vision models fail to interpret video stimuli effectively.
LLMs tend to overrate certain social acts compared to humans.
Abstract
Large language models (LLMs) are increasingly used in robotics, especially for high-level action planning. Meanwhile, many robotics applications involve human supervisors or collaborators. Hence, it is crucial for LLMs to generate socially acceptable actions that align with people's preferences and values. In this work, we test whether LLMs capture people's intuitions about behavior judgments and communication preferences in human-robot interaction (HRI) scenarios. For evaluation, we reproduce three HRI user studies, comparing the output of LLMs with that of real participants. We find that GPT-4 strongly outperforms other models, generating answers that correlate strongly with users' answers in two studies the first study dealing with selecting the most appropriate communicative act for a robot in various situations ( = 0.82), and the second with judging the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · AI in Service Interactions · Topic Modeling
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Layer Normalization · Absolute Position Encodings · Dropout · Softmax · Residual Connection
