Application of frozen large-scale models to multimodal task-oriented   dialogue

Tatsuki Kawamoto; Takuma Suzuki; Ko Miyama; Takumi Meguro; Tomohiro; Takagi

arXiv:2310.00845·cs.CL·October 3, 2023

Application of frozen large-scale models to multimodal task-oriented dialogue

Tatsuki Kawamoto, Takuma Suzuki, Ko Miyama, Takumi Meguro, Tomohiro, Takagi

PDF

Open Access

TL;DR

This paper demonstrates that large-scale pre-trained language models, when applied with the LENS framework, significantly improve multimodal task-oriented dialogue performance without additional training.

Contribution

The study introduces the application of the LENS framework to multimodal dialogues, showing fixed large-scale models outperform traditional trained models in this domain.

Findings

01

10.8% improvement in fluency

02

8.8% increase in usefulness

03

5.2% enhancement in relevance and coherence

Abstract

In this study, we use the existing Large Language Models ENnhanced to See Framework (LENS Framework) to test the feasibility of multimodal task-oriented dialogues. The LENS Framework has been proposed as a method to solve computer vision tasks without additional training and with fixed parameters of pre-trained models. We used the Multimodal Dialogs (MMD) dataset, a multimodal task-oriented dialogue benchmark dataset from the fashion field, and for the evaluation, we used the ChatGPT-based G-EVAL, which only accepts textual modalities, with arrangements to handle multimodal data. Compared to Transformer-based models in previous studies, our method demonstrated an absolute lift of 10.8% in fluency, 8.8% in usefulness, and 5.2% in relevance and coherence. The results show that using large-scale models with fixed parameters rather than using models trained on a dataset from scratch…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications