Learning Visual-Audio Representations for Voice-Controlled Robots

Peixin Chang; Shuijing Liu; D. Livingston McPherson; Katherine; Driggs-Campbell

arXiv:2109.02823·cs.RO·March 7, 2023·1 cites

Learning Visual-Audio Representations for Voice-Controlled Robots

Peixin Chang, Shuijing Liu, D. Livingston McPherson, Katherine, Driggs-Campbell

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel visual-audio representation learning pipeline for voice-controlled robots that improves task performance with fewer labels and better generalization across different robots and tasks.

Contribution

The authors propose a new pipeline that learns visual-audio representations and uses reinforcement learning, reducing label dependence and enhancing adaptability across platforms.

Findings

01

Outperforms previous methods with fewer labels

02

Works across various robots and tasks

03

Self-improves in unseen scenarios with limited new data

Abstract

Based on the recent advancements in representation learning, we propose a novel pipeline for task-oriented voice-controlled robots with raw sensor inputs. Previous methods rely on a large number of labels and task-specific reward functions. Not only can such an approach hardly be improved after the deployment, but also has limited generalization across robotic platforms and tasks. To address these problems, our pipeline first learns a visual-audio representation (VAR) that associates images and sound commands. Then the robot learns to fulfill the sound command via reinforcement learning using the reward generated by the VAR. We demonstrate our approach with various sound types, robots, and tasks. We show that our method outperforms previous work with much fewer labels. We show in both the simulated and real-world experiments that the system can self-improve in previously unseen…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PeixinC/VoiceControlledRobot-VAR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis