EmoVLM-KD: Fusing Distilled Expertise with Vision-Language Models for Visual Emotion Analysis

SangEun Lee; Yubeen Lee; Eunil Park

arXiv:2505.07164·cs.MM·May 13, 2025

EmoVLM-KD: Fusing Distilled Expertise with Vision-Language Models for Visual Emotion Analysis

SangEun Lee, Yubeen Lee, Eunil Park

PDF

Open Access 1 Repo

TL;DR

EmoVLM-KD introduces a novel approach that combines instruction-tuned vision-language models with a distilled vision module to improve visual emotion analysis performance efficiently.

Contribution

The paper presents EmoVLM-KD, a method that distills knowledge from conventional vision models into vision-language models for enhanced emotion prediction.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Maintains computational efficiency compared to dual-model approaches.

03

Effectively balances predictions from vision-language and vision models.

Abstract

Visual emotion analysis, which has gained considerable attention in the field of affective computing, aims to predict the dominant emotions conveyed by an image. Despite advancements in visual emotion analysis with the emergence of vision-language models, we observed that instruction-tuned vision-language models and conventional vision models exhibit complementary strengths in visual emotion analysis, as vision-language models excel in certain cases, whereas vision models perform better in others. This finding highlights the need to integrate these capabilities to enhance the performance of visual emotion analysis. To bridge this gap, we propose EmoVLM-KD, an instruction-tuned vision-language model augmented with a lightweight module distilled from conventional vision models. Instead of deploying both models simultaneously, which incurs high computational costs, we transfer the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sange1104/emovlm-kd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Emotion and Mood Recognition · Multimodal Machine Learning Applications