Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination
Rakshit Trivedi, Kartik Sharma, David C Parkes

TL;DR
This paper introduces MIMIC, a novel framework that uses language-based inner speech to improve imitation learning in AI, enabling diverse, faithful, and steerable behaviors in human-AI coordination tasks.
Contribution
MIMIC is the first approach to incorporate language-based inner speech for steering and diversifying imitation learning in AI agents.
Findings
Enhanced behavior diversity and fidelity in tasks
Effective behavioral steering at inference time
No additional demonstration data needed for steering
Abstract
Effective human-AI coordination requires artificial agents capable of exhibiting and responding to human-like behaviors while adapting to changing contexts. Imitation learning has emerged as one of the prominent approaches to build such agents by training them to mimic human-demonstrated behaviors. However, current methods struggle to capture the inherent diversity and non-Markovian nature of human behavior and lack the ability to steer behavior at inference time. Drawing inspiration from the theory of human cognitive processes, where inner speech guides action selection before execution, we propose MIMIC (Modeling Inner Motivations for Imitation and Control), a framework that uses language as an internal representation of behavioral intent. MIMIC employs the novel use of vision-language models as linguistic scaffolding to train a conditional variational autoencoder capable of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSocial Robot Interaction and HRI · Action Observation and Synchronization · Multimodal Machine Learning Applications
