TL;DR
This paper introduces EmoPrefer, a novel approach leveraging multimodal large language models to understand human emotion preferences, including a new dataset and benchmark for evaluating model performance in emotion preference prediction.
Contribution
It is the first work to explore LLMs for decoding human emotion preferences, providing a new dataset, benchmark, and strategies to improve preference prediction accuracy.
Findings
MLLMs can effectively predict human emotion preferences.
The EmoPrefer-Bench evaluates various models and prompting techniques.
Proposed strategies improve model performance in emotion preference understanding.
Abstract
Descriptive Multimodal Emotion Recognition (DMER) has garnered increasing research attention. Unlike traditional discriminative paradigms that rely on predefined emotion taxonomies, DMER aims to describe human emotional state using free-form natural language, enabling finer-grained and more interpretable emotion representations. However, this free-form prediction paradigm introduces new challenges regarding its evaluation. Previous works depend on ground-truth descriptions, but emotions are inherently tied to diverse human behaviors, and generating a comprehensive and accurate description is inherently demanding. Other researchers reformulate this problem into a more tractable human preference learning task, but pairwise preference annotation involves substantial manual effort. This leads to a question: can we leverage multimodal LLMs (MLLMs) to achieve more cost-efficient preference…
Peer Reviews
Decision·ICLR 2026 Poster
1.This paper proposes EmoPrefer, dedicated to human emotion preference. 2.This paper introduces a new dataset and a new benchmark, laying the foundations for this field. 3.This paper conducts extensive experiments, revealing the upper-bound performance of current models, and proposes techniques to enhance the performance on emotion preference prediction. 4.This paper has promising applications in descriptive emotion understanding and MLLM training.
1. This paper aims to investigate whether MLLMs can replace humans in decoding emotion preferences. Besides experiments on EmoPrefer-Data, it would be beneficial to further discuss the relationship between MLLMs and humans in practical applications (mentioned in Figure 12). 2. For EmoPrefer-Data, the authors primarily use samples with unanimous preference annotations. To better reveal human preferences, an analysis of which descriptions humans prefer most—considering factors such as emotion ric
The results of the experiment are promising.
1. The dataset's sample source is too narrow and the scenarios are too limited. All original videos are from the MER2024 dataset, and the scenarios are restricted to "a single character facing forward with complete audio," lacking diverse scenarios such as multi-person interaction and complex environmental background interference. 2. This paper does not explain the theoretical basis for choosing binary WAF as the default metric. Further correlation analysis between binary and tri-class performan
1. This paper is working on a novel research problem. It also presents a EmoPrefer-Data and EmoPrefer-Bench to support the research for this area. 2. Experimental Setup is comprehensive with insightful analyses.
1. Limited Dataset Scale and Diversity. Only 1,368 annotated pairs. The videos are only sourced from MER2024. 2. Lack of Downstream Evaluation. No demonstration of how emotion preference prediction could enhance practical applications.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
