CAVER: Curious Audiovisual Exploring Robot

Luca Macesanu; Boueny Folefack; Samik Singh; Ruchira Ray; Ben Abbatematteo; Roberto Mart\'in-Mart\'in

arXiv:2511.07619·cs.RO·March 9, 2026

CAVER: Curious Audiovisual Exploring Robot

Luca Macesanu, Boueny Folefack, Samik Singh, Ruchira Ray, Ben Abbatematteo, Roberto Mart\'in-Mart\'in

PDF

Open Access

TL;DR

CAVER is a robot that actively explores objects to learn rich audiovisual representations, enabling improved material classification and imitation of audio-only demonstrations through curiosity-driven interaction.

Contribution

The paper introduces a novel audiovisual exploration robot with a new end-effector, combined representation, and curiosity-based exploration algorithm, advancing multimodal robotic perception.

Findings

01

CAVER efficiently builds audiovisual representations with fewer interactions.

02

The learned representations improve material classification accuracy.

03

CAVER successfully imitates audio-only human demonstrations.

Abstract

Multimodal audiovisual perception can enable new avenues for robotic manipulation, from better material classification to the imitation of demonstrations for which only audio signals are available (e.g., playing a tune by ear). However, to unlock such multimodal potential, robots need to learn the correlations between an object's visual appearance and the sound it generates when they interact with it. Such an active sensorimotor experience requires new interaction capabilities, representations, and exploration methods to guide the robot in efficiently building increasingly rich audiovisual knowledge. In this work, we present CAVER, a novel robot that builds and utilizes rich audiovisual representations of objects. CAVER includes three novel contributions: 1) a novel 3D printed end-effector, attachable to parallel grippers, that excites objects' audio responses, 2) an audiovisual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSocial Robot Interaction and HRI · Generative Adversarial Networks and Image Synthesis · Robot Manipulation and Learning