Multimodal representation models for prediction and control from partial information
Martina Zambelli, Antoine Cully, Yiannis Demiris

TL;DR
This paper introduces a multimodal variational autoencoder for robots that can learn from multiple sensor types, reconstruct missing data, predict states, and imitate observed trajectories, enhancing robot perception and control.
Contribution
It presents a novel multimodal variational autoencoder that handles missing sensor data and captures robot kinematics, with a new training strategy for complex multimodal learning.
Findings
High accuracy in reconstructing missing sensory modalities
Effective prediction of sensorimotor states and visual trajectories
Successful imitation of observed actions by the robot
Abstract
Similar to humans, robots benefit from interacting with their environment through a number of different sensor modalities, such as vision, touch, sound. However, learning from different sensor modalities is difficult, because the learning model must be able to handle diverse types of signals, and learn a coherent representation even when parts of the sensor inputs are missing. In this paper, a multimodal variational autoencoder is proposed to enable an iCub humanoid robot to learn representations of its sensorimotor capabilities from different sensor modalities. The proposed model is able to (1) reconstruct missing sensory modalities, (2) predict the sensorimotor state of self and the visual trajectories of other agents actions, and (3) control the agent to imitate an observed visual trajectory. Also, the proposed multimodal variational autoencoder can capture the kinematic redundancy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
