Understanding Acoustic Patterns of Human Teachers Demonstrating Manipulation Tasks to Robots
Akanksha Saran, Kush Desai, Mai Lee Chang, Rudolf Lioutikov, Andrea, Thomaz, Scott Niekum

TL;DR
This paper investigates the acoustic signals from human teachers demonstrating manipulation tasks to robots, analyzing speech features to understand the conveyed information and its potential to enhance robot learning from demonstrations.
Contribution
It characterizes the acoustic features of human demonstrations, revealing how speech conveys semantic content and expressive cues across different teaching conditions.
Findings
Teachers convey similar semantic concepts across conditions.
Speech duration and expressiveness vary with demonstration context.
Audio signals contain rich information beneficial for robot learning.
Abstract
Humans use audio signals in the form of spoken language or verbal reactions effectively when teaching new skills or tasks to other humans. While demonstrations allow humans to teach robots in a natural way, learning from trajectories alone does not leverage other available modalities including audio from human teachers. To effectively utilize audio cues accompanying human demonstrations, first it is important to understand what kind of information is present and conveyed by such cues. This work characterizes audio from human teachers demonstrating multi-step manipulation tasks to a situated Sawyer robot using three feature types: (1) duration of speech used, (2) expressiveness in speech or prosody, and (3) semantic content of speech. We analyze these features along four dimensions and find that teachers convey similar semantic concepts via spoken words for different conditions of (1)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
