What you reward is what you learn: Comparing rewards for online speech policy optimization in public HRI

Sichao Song; Yuki Okafuji; Kaito Ariu; Amy Koike

arXiv:2601.01969·cs.RO·January 6, 2026

What you reward is what you learn: Comparing rewards for online speech policy optimization in public HRI

Sichao Song, Yuki Okafuji, Kaito Ariu, Amy Koike

PDF

Open Access

TL;DR

This study investigates how different reward signals influence online speech policy learning in a public human-robot interaction setting, demonstrating the impact on robot behavior and interaction outcomes.

Contribution

It compares three reward types for online speech policy optimization in real-world HRI, providing practical insights and design lessons for deploying adaptive conversational robots.

Findings

01

Different rewards lead to distinct interaction behaviors.

02

User ratings and conversation closure rewards influence policy choices.

03

Offline analysis reveals contextual factors affecting policy performance.

Abstract

Designing policies that are both efficient and acceptable for conversational service robots in open and diverse environments is non-trivial. Unlike fixed, hand-tuned parameters, online learning can adapt to non-stationary conditions. In this paper, we study how to adapt a social robot's speech policy in the wild. During a 12-day in-situ deployment with over 1,400 public encounters, we cast online policy optimization as a multi-armed bandit problem and use Thompson sampling to select among six actions defined by speech rate (slow/normal/fast) and verbosity (concise/detailed). We compare three complementary binary rewards--Ru (user rating), Rc (conversation closure), and Rt (>=2 turns)--and show that each induces distinct arm distributions and interaction behaviors. We complement the online results with offline evaluations that analyze contextual factors (e.g., crowd level, group size)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSocial Robot Interaction and HRI · Reinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing