MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for   Effective-and-Efficient Vision-and-Language Navigation

Liuyi Wang; Zongtao He; Mengjiao Shen; Jingwei Yang; Chengju Liu,; Qijun Chen

arXiv:2406.17960·cs.CV·June 27, 2024·1 cites

MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation

Liuyi Wang, Zongtao He, Mengjiao Shen, Jingwei Yang, Chengju Liu,, Qijun Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces MAGIC, a novel knowledge distillation framework for creating lightweight, efficient vision-and-language navigation models that outperform previous methods and are suitable for real-time robotics applications.

Contribution

MAGIC combines meta-ability guided distillation with interactive chain learning, enabling effective multi-step teacher-student co-evolution for VLN tasks.

Findings

01

MAGIC-S, with only 5% of the teacher's size, outperforms previous methods.

02

MAGIC-L surpasses state-of-the-art by 5.84% in SPL and 3.18% in SR.

03

The method demonstrates superior real-time performance on a new dataset.

Abstract

Despite the remarkable developments of recent large models in Embodied Artificial Intelligence (E-AI), their integration into robotics is hampered by their excessive parameter sizes and computational demands. Towards the Vision-and-Language Navigation (VLN) task, a core task in E-AI, this paper reveals the great potential of using knowledge distillation for obtaining lightweight student models by proposing a Meta-Ability Guided Interactive Chain-of-distillation (MAGIC) method. Specifically, a Meta-Ability Knowledge Distillation (MAKD) framework is proposed for decoupling and refining the necessary meta-abilities of VLN agents. A Meta-Knowledge Randomization Weighting (MKRW) and a Meta-Knowledge Transferable Determination (MKTD) module are incorporated to dynamically adjust aggregation weights at the meta-ability and sample levels, respectively. Move beyond the traditional one-step…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

crystalsixone/vln-magic
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech and dialogue systems

MethodsSemi-Pseudo-Label · Knowledge Distillation