Link-Context Learning for Multimodal LLMs

Yan Tai; Weichen Fan; Zhao Zhang; Feng Zhu; Rui Zhao; Ziwei Liu

arXiv:2308.07891·cs.CV·August 16, 2023·1 cites

Link-Context Learning for Multimodal LLMs

Yan Tai, Weichen Fan, Zhao Zhang, Feng Zhu, Rui Zhao, Ziwei Liu

PDF

Open Access 1 Repo 2 Models 2 Datasets

TL;DR

This paper introduces link-context learning (LCL), a causal reasoning approach for multimodal large language models, enhancing their ability to recognize unseen images and understand novel concepts without additional training.

Contribution

The paper proposes link-context learning (LCL), a novel causal reasoning method that improves MLLMs' zero-shot recognition of unseen images and concepts, and introduces the ISEKAI dataset for evaluation.

Findings

01

LCL significantly outperforms vanilla MLLMs on unseen image recognition.

02

LCL enhances the understanding of causal relationships in data.

03

The ISEKAI dataset effectively evaluates link-context learning capabilities.

Abstract

The ability to learn from context with novel concepts, and deliver appropriate responses are essential in human conversations. Despite current Multimodal Large Language Models (MLLMs) and Large Language Models (LLMs) being trained on mega-scale datasets, recognizing unseen images or understanding novel concepts in a training-free manner remains a challenge. In-Context Learning (ICL) explores training-free few-shot learning, where models are encouraged to ``learn to learn" from limited tasks and generalize to unseen tasks. In this work, we propose link-context learning (LCL), which emphasizes "reasoning from cause and effect" to augment the learning capabilities of MLLMs. LCL goes beyond traditional ICL by explicitly strengthening the causal relationship between the support set and the query set. By providing demonstrations with causal links, LCL guides the model to discern not only the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

isekai-portal/Link-Context-Learning
pytorchOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications