TL;DR
EmoS is a high-fidelity bilingual benchmark dataset designed for fine-grained streaming emotional understanding, combining static and dynamic data with rigorous annotation to improve emotion recognition models.
Contribution
The paper introduces EmoS, a novel multimodal benchmark dataset with high ecological validity and reliable annotations, supporting advanced emotion recognition research.
Findings
Fine-tuning multimodal large language models on EmoS improves performance.
EmoS captures continuous emotional evolution with trusted ground truth.
The dataset and code are publicly available for research use.
Abstract
In the context of today's high-pressure, aging society, the demand for large-scale emotional models capable of providing empathetic support is more critical than ever. However, existing benchmarks fail to simultaneously achieve ecological validity, signal clarity, and reliable fine-grained labeling. We introduce EmoS, a high-fidelity bilingual benchmark designed to resolve the limitations of ecological validity and noise in existing datasets by combining strictly filtered static slices with a dynamic Streaming Monologue subset. Supported by a rigorous dual-layer human annotation pipeline, EmoS provides trusted ground truth that captures continuous emotional evolution. Empirical results show that fine-tuning MLLMs (multimodal large language models) on EmoS yields significant gains over zero-shot baselines, laying the foundation for the training and evaluation of future emotion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
