ZRIGF: An Innovative Multimodal Framework for Zero-Resource   Image-Grounded Dialogue Generation

Bo Zhang; Jian Wang; Hui Ma; Bo Xu; and Hongfei Lin

arXiv:2308.00400·cs.CL·August 3, 2023

ZRIGF: An Innovative Multimodal Framework for Zero-Resource Image-Grounded Dialogue Generation

Bo Zhang, Jian Wang, Hui Ma, Bo Xu, and Hongfei Lin

PDF

1 Repo

TL;DR

ZRIGF is a novel multimodal framework that enhances zero-resource image-grounded dialogue generation by integrating visual and textual information through contrastive and generative pre-training, demonstrating strong generalization in unseen domains.

Contribution

The paper introduces ZRIGF, a two-stage learning framework combining contrastive and generative pre-training for effective zero-resource image-grounded dialogue generation.

Findings

01

ZRIGF outperforms baselines in generating relevant responses.

02

Framework demonstrates robust generalization to new domains.

03

Effective multimodal feature alignment achieved through proposed modules.

Abstract

Image-grounded dialogue systems benefit greatly from integrating visual information, resulting in high-quality response generation. However, current models struggle to effectively utilize such information in zero-resource scenarios, mainly due to the disparity between image and text modalities. To overcome this challenge, we propose an innovative multimodal framework, called ZRIGF, which assimilates image-grounded information for dialogue generation in zero-resource situations. ZRIGF implements a two-stage learning strategy, comprising contrastive pre-training and generative pre-training. Contrastive pre-training includes a text-image matching module that maps images and texts into a unified encoded vector space, along with a text-assisted masked image modeling module that preserves pre-training visual features and fosters further multimodal feature alignment. Generative pre-training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangbo-nlp/zrigf
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.