LACMA: Language-Aligning Contrastive Learning with Meta-Actions for   Embodied Instruction Following

Cheng-Fu Yang; Yen-Chun Chen; Jianwei Yang; Xiyang Dai; Lu Yuan,; Yu-Chiang Frank Wang; Kai-Wei Chang

arXiv:2310.12344·cs.CL·October 20, 2023·1 cites

LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following

Cheng-Fu Yang, Yen-Chun Chen, Jianwei Yang, Xiyang Dai, Lu Yuan,, Yu-Chiang Frank Wang, Kai-Wei Chang

PDF

Open Access 1 Repo

TL;DR

This paper introduces LACMA, a novel approach combining contrastive learning and meta-actions to improve the generalization of embodied instruction-following agents in unseen environments, achieving significant success rate improvements.

Contribution

The paper proposes a new method that explicitly aligns agent states with instructions using contrastive learning and introduces meta-actions to bridge semantic gaps, enhancing generalization.

Findings

01

Achieves 4.5% higher success rate in unseen environments

02

Contrastive learning and meta-actions are complementary

03

Better state-instruction alignment for real-world applications

Abstract

End-to-end Transformers have demonstrated an impressive success rate for Embodied Instruction Following when the environment has been seen in training. However, they tend to struggle when deployed in an unseen environment. This lack of generalizability is due to the agent's insensitivity to subtle changes in natural language instructions. To mitigate this issue, we propose explicitly aligning the agent's hidden states with the instructions via contrastive learning. Nevertheless, the semantic gap between high-level language instructions and the agent's low-level action space remains an obstacle. Therefore, we further introduce a novel concept of meta-actions to bridge the gap. Meta-actions are ubiquitous action patterns that can be parsed from the original action sequence. These patterns represent higher-level semantics that are intuitively aligned closer to the instructions. When…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joeyy5588/lacma
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Softmax · Dense Connections · Residual Connection · Absolute Position Encodings · Adam · Byte Pair Encoding