Maria: A Visual Experience Powered Conversational Agent

Zujie Liang; Huang Hu; Can Xu; Chongyang Tao; Xiubo Geng; Yining Chen,; Fan Liang; Daxin Jiang

arXiv:2105.13073·cs.CL·June 24, 2021·1 cites

Maria: A Visual Experience Powered Conversational Agent

Zujie Liang, Huang Hu, Can Xu, Chongyang Tao, Xiubo Geng, Yining Chen,, Fan Liang, Daxin Jiang

PDF

Open Access 1 Repo

TL;DR

Maria is a novel visual experience-powered conversational agent that retrieves images from a large-scale index and generates informed responses grounded in visual knowledge, advancing open-ended image-grounded dialogue.

Contribution

This work introduces Maria, a fully open-ended image-grounded conversational model with a retrieval-based visual knowledge component, unlike prior models relying on paired dialog-image data.

Findings

01

Maria outperforms state-of-the-art methods on automatic metrics.

02

Maria generates responses with visual commonsense of the physical world.

03

Extensive experiments validate the effectiveness of Maria's approach.

Abstract

Arguably, the visual perception of conversational agents to the physical world is a key way for them to exhibit the human-like intelligence. Image-grounded conversation is thus proposed to address this challenge. Existing works focus on exploring the multimodal dialog models that ground the conversation on a given image. In this paper, we take a step further to study image-grounded conversation under a fully open-ended setting where no paired dialog and image are assumed available. Specifically, we present Maria, a neural conversation agent powered by the visual world experiences which are retrieved from a large-scale image index. Maria consists of three flexible components, i.e., text-to-image retriever, visual concept detector and visual-knowledge-grounded response generator. The retriever aims to retrieve a correlated image to the dialog from an image index, while the visual concept…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jokieleung/Maria
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning