Moodifier: MLLM-Enhanced Emotion-Driven Image Editing

Jiarong Ye; Sharon X. Huang

arXiv:2507.14024·cs.CV·July 21, 2025

Moodifier: MLLM-Enhanced Emotion-Driven Image Editing

Jiarong Ye, Sharon X. Huang

PDF

TL;DR

Moodifier is a novel system that combines a large emotional image dataset, a vision-language model, and a multimodal language model to enable precise emotion-driven image editing across various domains.

Contribution

The paper introduces MoodArchive, MoodifyCLIP, and Moodifier, a comprehensive framework for emotion-based image editing that outperforms existing methods in accuracy and content preservation.

Findings

01

Moodifier achieves superior emotional accuracy.

02

Content preservation is maintained during editing.

03

System works across diverse visual domains.

Abstract

Bridging emotions and visual content for emotion-driven image editing holds great potential in creative industries, yet precise manipulation remains challenging due to the abstract nature of emotions and their varied manifestations across different contexts. We tackle this challenge with an integrated approach consisting of three complementary components. First, we introduce MoodArchive, an 8M+ image dataset with detailed hierarchical emotional annotations generated by LLaVA and partially validated by human evaluators. Second, we develop MoodifyCLIP, a vision-language model fine-tuned on MoodArchive to translate abstract emotions into specific visual attributes. Third, we propose Moodifier, a training-free editing model leveraging MoodifyCLIP and multimodal large language models (MLLMs) to enable precise emotional transformations while preserving content integrity. Our system works…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.