Learning Bidirectional Translation between Descriptions and Actions with Small Paired Data
Minori Toyoda, Kanata Suzuki, Yoshihiko Hayashi, Tetsuya Ogata

TL;DR
This paper presents a two-stage training approach enabling bidirectional translation between descriptions and actions with limited paired data, leveraging large non-paired datasets for pre-training and fine-tuning on small paired datasets.
Contribution
It introduces a novel two-stage training method that uses large non-paired data for pre-training and small paired data for fine-tuning to achieve bidirectional translation.
Findings
Effective bidirectional translation with limited paired data
Intermediate representations cluster similar actions and descriptions
Model performs well even with small amounts of paired data
Abstract
This study achieved bidirectional translation between descriptions and actions using small paired data from different modalities. The ability to mutually generate descriptions and actions is essential for robots to collaborate with humans in their daily lives, which generally requires a large dataset that maintains comprehensive pairs of both modality data. However, a paired dataset is expensive to construct and difficult to collect. To address this issue, this study proposes a two-stage training method for bidirectional translation. In the proposed method, we train recurrent autoencoders (RAEs) for descriptions and actions with a large amount of non-paired data. Then, we finetune the entire model to bind their intermediate representations using small paired data. Because the data used for pre-training do not require pairing, behavior-only data or a large language corpus can be used. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRegularized Autoencoders
