Prototypical Contrastive Transfer Learning for Multimodal Language Understanding
Seitaro Otsuki, Shintaro Ishikawa, Komei Sugiura

TL;DR
This paper introduces Prototypical Contrastive Transfer Learning (PCTL), a novel approach that leverages simulation data to improve multimodal language understanding for domestic robots, significantly enhancing object identification accuracy from natural language instructions.
Contribution
The paper presents a new transfer learning framework with a contrastive loss, Dual ProtoNCE, specifically designed for multimodal language understanding in domestic robot applications.
Findings
PCTL outperforms existing methods in object identification accuracy.
PCTL achieves 78.1% accuracy compared to 73.4% by simple fine-tuning.
New datasets were created for real-world and simulation environments.
Abstract
Although domestic service robots are expected to assist individuals who require support, they cannot currently interact smoothly with people through natural language. For example, given the instruction "Bring me a bottle from the kitchen," it is difficult for such robots to specify the bottle in an indoor environment. Most conventional models have been trained on real-world datasets that are labor-intensive to collect, and they have not fully leveraged simulation data through a transfer learning framework. In this study, we propose a novel transfer learning approach for multimodal language understanding called Prototypical Contrastive Transfer Learning (PCTL), which uses a new contrastive loss called Dual ProtoNCE. We introduce PCTL to the task of identifying target objects in domestic environments according to free-form natural language instructions. To validate PCTL, we built new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Speech and dialogue systems
Methodstravel james
