Multimodal Interactive Learning of Primitive Actions
Tuan Do, Nikhil Krishnaswamy, Kyeongmin Rim, and James Pustejovsky

TL;DR
This paper presents a framework for teaching primitive actions to machines through multimodal interaction, combining demonstrations and natural language communication to improve learning efficiency and model accuracy.
Contribution
It introduces a novel multimodal interactive learning framework that integrates demonstration and natural language communication for teaching actions with few samples.
Findings
Enhanced learning from limited demonstrations.
Improved model fine-tuning through human-computer interaction.
Effective use of multimodal teaching modalities.
Abstract
We describe an ongoing project in learning to perform primitive actions from demonstrations using an interactive interface. In our previous work, we have used demonstrations captured from humans performing actions as training samples for a neural network-based trajectory model of actions to be performed by a computational agent in novel setups. We found that our original framework had some limitations that we hope to overcome by incorporating communication between the human and the computational agent, using the interaction between them to fine-tune the model learned by the machine. We propose a framework that uses multimodal human-computer interaction to teach action concepts to machines, making use of both live demonstration and communication through natural language, as two distinct teaching modalities, while requiring few training samples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling
