Nuanced Emotion Recognition Based on a Segment-based MLLM Framework Leveraging Qwen3-Omni for AH Detection

Liang Tang; Hongda Li; Jiayu Zhang; Long Chen; Shuxian Li; Siqi Pei; Tiaonan Duan; Yuhao Cheng

arXiv:2603.13406·cs.CV·March 24, 2026

Nuanced Emotion Recognition Based on a Segment-based MLLM Framework Leveraging Qwen3-Omni for AH Detection

Liang Tang, Hongda Li, Jiayu Zhang, Long Chen, Shuxian Li, Siqi Pei, Tiaonan Duan, Yuhao Cheng

PDF

Open Access

TL;DR

This paper introduces a segment-based multimodal large language model framework leveraging Qwen3-Omni for nuanced emotion recognition in videos, effectively capturing subtle psychological states like Ambivalence and Hesitancy with high accuracy.

Contribution

It proposes a novel segment-based approach combined with fine-tuned Multimodal Large Language Models to improve recognition of complex emotional states in videos.

Findings

01

Achieved 85.1% accuracy on the test set.

02

Outperformed existing benchmarks in emotion recognition.

03

Validated the effectiveness of multimodal LLMs in detecting nuanced psychological states.

Abstract

Emotion recognition in videos is a pivotal task in affective computing, where identifying subtle psychological states such as Ambivalence and Hesitancy holds significant value for behavioral intervention and digital health. Ambivalence and Hesitancy states often manifest through cross-modal inconsistencies such as discrepancies between facial expressions, vocal tones, and textual semantics, posing a substantial challenge for automated recognition. This paper proposes a recognition framework that integrates temporal segment modeling with Multimodal Large Language Models. To address computational efficiency and token constraints in long video processing, we employ a segment-based strategy, partitioning videos into short clips with a maximum duration of 5 seconds. We leverage the Qwen3-Omni-30B-A3B model, fine-tuned on the BAH dataset using LoRA and full-parameter strategies via the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Human Pose and Action Recognition · Face recognition and analysis