ICCV23 Visual-Dialog Emotion Explanation Challenge: SEU_309 Team Technical Report
Yixiao Yuan, Yingzhe Peng

TL;DR
This paper presents a top-performing approach for generating emotion explanations in visual-dialog interactions related to art, utilizing advanced multi-modal models to outperform benchmarks.
Contribution
It introduces a novel combination of language and vision-language models for emotion explanation generation in visual-dialog tasks, achieving state-of-the-art results.
Findings
Achieved top rank in ICCV23 challenge
Secured significant scores in F1 and BLEU metrics
Demonstrated superior emotion explanation accuracy
Abstract
The Visual-Dialog Based Emotion Explanation Generation Challenge focuses on generating emotion explanations through visual-dialog interactions in art discussions. Our approach combines state-of-the-art multi-modal models, including Language Model (LM) and Large Vision Language Model (LVLM), to achieve superior performance. By leveraging these models, we outperform existing benchmarks, securing the top rank in the ICCV23 Visual-Dialog Based Emotion Explanation Generation Challenge, which is part of the 5th Workshop On Closing The Loop Between Vision And Language (CLCV) with significant scores in F1 and BLEU metrics. Our method demonstrates exceptional ability in generating accurate emotion explanations, advancing our understanding of emotional impacts in art.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
