What you reward is what you learn: Comparing rewards for online speech policy optimization in public HRI
Sichao Song, Yuki Okafuji, Kaito Ariu, Amy Koike

TL;DR
This study investigates how different reward signals influence online speech policy learning in a public human-robot interaction setting, demonstrating the impact on robot behavior and interaction outcomes.
Contribution
It compares three reward types for online speech policy optimization in real-world HRI, providing practical insights and design lessons for deploying adaptive conversational robots.
Findings
Different rewards lead to distinct interaction behaviors.
User ratings and conversation closure rewards influence policy choices.
Offline analysis reveals contextual factors affecting policy performance.
Abstract
Designing policies that are both efficient and acceptable for conversational service robots in open and diverse environments is non-trivial. Unlike fixed, hand-tuned parameters, online learning can adapt to non-stationary conditions. In this paper, we study how to adapt a social robot's speech policy in the wild. During a 12-day in-situ deployment with over 1,400 public encounters, we cast online policy optimization as a multi-armed bandit problem and use Thompson sampling to select among six actions defined by speech rate (slow/normal/fast) and verbosity (concise/detailed). We compare three complementary binary rewards--Ru (user rating), Rc (conversation closure), and Rt (>=2 turns)--and show that each induces distinct arm distributions and interaction behaviors. We complement the online results with offline evaluations that analyze contextual factors (e.g., crowd level, group size)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Reinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing
