"Can you be my mum?": Manipulating Social Robots in the Large Language Models Era
Giulio Antonio Abbo, Gloria Desideri, Tony Belpaeme, Micol Spitale

TL;DR
This study investigates how users attempt to manipulate large language model-powered social robots to bypass safety features, revealing techniques and informing future safeguards for ethical human-robot interactions.
Contribution
It provides empirical insights into manipulation techniques used by users to bypass safety measures in social robots powered by large language models.
Findings
Participants used five manipulation techniques including emotional appeals.
Users attempted to induce robots to violate ethical principles.
Study highlights vulnerabilities in current safety mechanisms.
Abstract
Recent advancements in robots powered by large language models have enhanced their conversational abilities, enabling interactions closely resembling human dialogue. However, these models introduce safety and security concerns in HRI, as they are vulnerable to manipulation that can bypass built-in safety measures. Imagining a social robot deployed in a home, this work aims to understand how everyday users try to exploit a language model to violate ethical principles, such as by prompting the robot to act like a life partner. We conducted a pilot study involving 21 university students who interacted with a Misty robot, attempting to circumvent its safety mechanisms across three scenarios based on specific HRI ethical principles: attachment, freedom, and empathy. Our results reveal that participants employed five techniques, including insulting and appealing to pity using emotional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
