Cognition-Inspired Dual-Stream Semantic Enhancement for Vision-Based Dynamic Emotion Modeling

Huanzhen Wang; Ziheng Zhou; Zeng Tao; Aoxing Li; Yingkai Zhao; Yuxuan Lin; Yan Wang; Wenqiang Zhang

arXiv:2604.12777·cs.CV·April 15, 2026

Cognition-Inspired Dual-Stream Semantic Enhancement for Vision-Based Dynamic Emotion Modeling

Huanzhen Wang, Ziheng Zhou, Zeng Tao, Aoxing Li, Yingkai Zhao, Yuxuan Lin, Yan Wang, Wenqiang Zhang

PDF

TL;DR

This paper introduces a cognition-inspired dual-stream model for dynamic emotion recognition that integrates semantic and contextual knowledge, inspired by neuro-cognitive mechanisms, achieving state-of-the-art results.

Contribution

It proposes a novel dual-stream architecture, HTPC and LSEA, modeling cognitive priming and knowledge integration for improved emotion perception.

Findings

01

Achieves state-of-the-art performance on in-the-wild benchmarks.

02

Demonstrates improved interpretability over existing models.

03

Validates the neuro-cognitive plausibility of the approach.

Abstract

The human brain constructs emotional percepts not by processing facial expressions in isolation, but through a dynamic, hierarchical integration of sensory input with semantic and contextual knowledge. However, existing vision-based dynamic emotion modeling approaches often neglect emotion perception and cognitive theories. To bridge this gap between machine and human emotion perception, we propose cognition-inspired Dual-stream Semantic Enhancement (DuSE). Our model instantiates a dual-stream cognitive architecture. The first stream, a Hierarchical Temporal Prompt Cluster (HTPC), operationalizes the cognitive priming effect. It simulates how linguistic cues pre-sensitize neural pathways, modulating the processing of incoming visual stimuli by aligning textual semantics with fine-grained temporal features of facial dynamics. The second stream, a Latent Semantic Emotion Aggregator…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.