ICCV23 Visual-Dialog Emotion Explanation Challenge: SEU_309 Team   Technical Report

Yixiao Yuan; Yingzhe Peng

arXiv:2407.09760·cs.CV·July 16, 2024

ICCV23 Visual-Dialog Emotion Explanation Challenge: SEU_309 Team Technical Report

Yixiao Yuan, Yingzhe Peng

PDF

Open Access

TL;DR

This paper presents a top-performing approach for generating emotion explanations in visual-dialog interactions related to art, utilizing advanced multi-modal models to outperform benchmarks.

Contribution

It introduces a novel combination of language and vision-language models for emotion explanation generation in visual-dialog tasks, achieving state-of-the-art results.

Findings

01

Achieved top rank in ICCV23 challenge

02

Secured significant scores in F1 and BLEU metrics

03

Demonstrated superior emotion explanation accuracy

Abstract

The Visual-Dialog Based Emotion Explanation Generation Challenge focuses on generating emotion explanations through visual-dialog interactions in art discussions. Our approach combines state-of-the-art multi-modal models, including Language Model (LM) and Large Vision Language Model (LVLM), to achieve superior performance. By leveraging these models, we outperform existing benchmarks, securing the top rank in the ICCV23 Visual-Dialog Based Emotion Explanation Generation Challenge, which is part of the 5th Workshop On Closing The Loop Between Vision And Language (CLCV) with significant scores in F1 and BLEU metrics. Our method demonstrates exceptional ability in generating accurate emotion explanations, advancing our understanding of emotional impacts in art.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning