TL;DR
Nano-EmoX is a compact, multimodal language model that unifies six affective tasks across perception, understanding, and interaction levels, advancing emotional intelligence with a novel training framework.
Contribution
It introduces Nano-EmoX, a 2.2B parameter model, and P2E training framework, unifying affective tasks and improving cross-task transferability and emotional understanding.
Findings
Achieves state-of-the-art performance on multiple affective benchmarks.
Unifies six core affective tasks across three hierarchy levels.
Demonstrates excellent efficiency and generalization.
Abstract
The development of affective multimodal language models (MLMs) has long been constrained by a gap between low-level perception and high-level interaction, leading to fragmented affective capabilities and limited generalization. To bridge this gap, we propose a cognitively inspired three-level hierarchy that organizes affective tasks according to their cognitive depth-perception, understanding, and interaction-and provides a unified conceptual foundation for advancing affective modeling. Guided by this hierarchy, we introduce Nano-EmoX, a small-scale multitask MLM, and P2E (Perception-to-Empathy), a curriculum-based training framework. Nano-EmoX integrates a suite of omni-modal encoders, including an enhanced facial encoder and a fusion encoder, to capture key multimodal affective cues and improve cross-task transferability. The outputs are projected into a unified language space via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
