EmoS: A High-Fidelity Multimodal Benchmark for Fine-grained Streaming Emotional Understanding

Pengze Guo; Jingxi Liang; Zhiwen Xie; Qifeng Wang; Derek F. Wong

arXiv:2605.08847·cs.CL·May 12, 2026

EmoS: A High-Fidelity Multimodal Benchmark for Fine-grained Streaming Emotional Understanding

Pengze Guo, Jingxi Liang, Zhiwen Xie, Qifeng Wang, Derek F. Wong

PDF

1 Repo

TL;DR

EmoS is a high-fidelity bilingual benchmark dataset designed for fine-grained streaming emotional understanding, combining static and dynamic data with rigorous annotation to improve emotion recognition models.

Contribution

The paper introduces EmoS, a novel multimodal benchmark dataset with high ecological validity and reliable annotations, supporting advanced emotion recognition research.

Findings

01

Fine-tuning multimodal large language models on EmoS improves performance.

02

EmoS captures continuous emotional evolution with trusted ground truth.

03

The dataset and code are publicly available for research use.

Abstract

In the context of today's high-pressure, aging society, the demand for large-scale emotional models capable of providing empathetic support is more critical than ever. However, existing benchmarks fail to simultaneously achieve ecological validity, signal clarity, and reliable fine-grained labeling. We introduce EmoS, a high-fidelity bilingual benchmark designed to resolve the limitations of ecological validity and noise in existing datasets by combining strictly filtered static slices with a dynamic Streaming Monologue subset. Supported by a rigorous dual-layer human annotation pipeline, EmoS provides trusted ground truth that captures continuous emotional evolution. Empirical results show that fine-tuning MLLMs (multimodal large language models) on EmoS yields significant gains over zero-shot baselines, laying the foundation for the training and evaluation of future emotion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NLP2CT/EmoS
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.