Multimodal representation models for prediction and control from partial   information

Martina Zambelli; Antoine Cully; Yiannis Demiris

arXiv:1910.03854·cs.RO·October 10, 2019

Multimodal representation models for prediction and control from partial information

Martina Zambelli, Antoine Cully, Yiannis Demiris

PDF

TL;DR

This paper introduces a multimodal variational autoencoder for robots that can learn from multiple sensor types, reconstruct missing data, predict states, and imitate observed trajectories, enhancing robot perception and control.

Contribution

It presents a novel multimodal variational autoencoder that handles missing sensor data and captures robot kinematics, with a new training strategy for complex multimodal learning.

Findings

01

High accuracy in reconstructing missing sensory modalities

02

Effective prediction of sensorimotor states and visual trajectories

03

Successful imitation of observed actions by the robot

Abstract

Similar to humans, robots benefit from interacting with their environment through a number of different sensor modalities, such as vision, touch, sound. However, learning from different sensor modalities is difficult, because the learning model must be able to handle diverse types of signals, and learn a coherent representation even when parts of the sensor inputs are missing. In this paper, a multimodal variational autoencoder is proposed to enable an iCub humanoid robot to learn representations of its sensorimotor capabilities from different sensor modalities. The proposed model is able to (1) reconstruct missing sensory modalities, (2) predict the sensorimotor state of self and the visual trajectories of other agents actions, and (3) control the agent to imitate an observed visual trajectory. Also, the proposed multimodal variational autoencoder can capture the kinematic redundancy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.