Moodifier: MLLM-Enhanced Emotion-Driven Image Editing
Jiarong Ye, Sharon X. Huang

TL;DR
Moodifier is a novel system that combines a large emotional image dataset, a vision-language model, and a multimodal language model to enable precise emotion-driven image editing across various domains.
Contribution
The paper introduces MoodArchive, MoodifyCLIP, and Moodifier, a comprehensive framework for emotion-based image editing that outperforms existing methods in accuracy and content preservation.
Findings
Moodifier achieves superior emotional accuracy.
Content preservation is maintained during editing.
System works across diverse visual domains.
Abstract
Bridging emotions and visual content for emotion-driven image editing holds great potential in creative industries, yet precise manipulation remains challenging due to the abstract nature of emotions and their varied manifestations across different contexts. We tackle this challenge with an integrated approach consisting of three complementary components. First, we introduce MoodArchive, an 8M+ image dataset with detailed hierarchical emotional annotations generated by LLaVA and partially validated by human evaluators. Second, we develop MoodifyCLIP, a vision-language model fine-tuned on MoodArchive to translate abstract emotions into specific visual attributes. Third, we propose Moodifier, a training-free editing model leveraging MoodifyCLIP and multimodal large language models (MLLMs) to enable precise emotional transformations while preserving content integrity. Our system works…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
