Distilling Implicit Multimodal Knowledge into Large Language Models for   Zero-Resource Dialogue Generation

Bo Zhang; Hui Ma; Jian Ding; Jian Wang; Bo Xu; Hongfei Lin

arXiv:2405.10121·cs.CL·February 6, 2025

Distilling Implicit Multimodal Knowledge into Large Language Models for Zero-Resource Dialogue Generation

Bo Zhang, Hui Ma, Jian Ding, Jian Wang, Bo Xu, Hongfei Lin

PDF

1 Repo

TL;DR

This paper introduces VIKDF, a novel framework that distills and integrates implicit multimodal knowledge into large language models, significantly improving zero-resource dialogue generation by leveraging visual cues.

Contribution

The paper presents a new knowledge distillation and integration method for enhancing LLMs with implicit multimodal knowledge in zero-resource scenarios.

Findings

01

VIKDF outperforms existing models in dialogue quality

02

Effective encoding of visual implicit knowledge improves coherence

03

Seamless integration enhances contextual understanding

Abstract

Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities. However, the effective incorporation of such knowledge in zero-resource scenarios remains a substantial challenge due to the scarcity of diverse, high-quality dialogue datasets. To address this, we propose the Visual Implicit Knowledge Distillation Framework (VIKDF), an innovative approach aimed at enhancing LLMs for enriched dialogue generation in zero-resource contexts by leveraging implicit multimodal knowledge. VIKDF comprises two main stages: knowledge distillation, using an Implicit Query Transformer to extract and encode visual implicit knowledge from image-text pairs into knowledge vectors; and knowledge integration, employing a novel Bidirectional Variational Information Fusion technique to seamlessly integrate these distilled vectors into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangbo-nlp/vikdf
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Byte Pair Encoding · Adam · Dropout · Softmax