Language Model-Based Paired Variational Autoencoders for Robotic Language Learning
Ozan \"Ozdemir, Matthias Kerzel, Cornelius Weber, Jae Hee Lee, Stefan, Wermter

TL;DR
This paper introduces a neural model that links robot actions with language descriptions, leveraging variational autoencoders and pretrained language models to improve language understanding and generalization in robotic learning.
Contribution
The work extends previous PVAE models by integrating BERT, enabling robots to understand and generate natural language descriptions beyond predefined vocabularies.
Findings
VAE outperforms standard autoencoders in binding actions and descriptions.
Channel-separated visual features handle objects of different shapes.
BERT integration allows understanding of unconstrained natural language.
Abstract
Human infants learn language while interacting with their environment in which their caregivers may describe the objects and actions they perform. Similar to human infants, artificial agents can learn language while interacting with their environment. In this work, first, we present a neural model that bidirectionally binds robot actions and their language descriptions in a simple object manipulation scenario. Building on our previous Paired Variational Autoencoders (PVAE) model, we demonstrate the superiority of the variational autoencoder over standard autoencoders by experimenting with cubes of different colours, and by enabling the production of alternative vocabularies. Additional experiments show that the model's channel-separated visual feature extraction module can cope with objects of different shapes. Next, we introduce PVAE-BERT, which equips the model with a pretrained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
