Navigating the Emotion Tree: Hierarchical Hyperbolic RAG for Multimodal Emotion Recognition
Zeheng Wang, Bo Zhao, Yijie Zhu, Zhishu Liu, Hui Ma, Ruixin Zhang, Shouhong Ding, Qianyu Xie, Zitong Yu

TL;DR
This paper introduces HyperEmo-RAG, a hierarchical hyperbolic retrieval-augmented generation framework that leverages structured emotion knowledge for improved multimodal emotion recognition.
Contribution
It proposes a novel hierarchical hyperbolic embedding and evidence injection method to enhance emotion classification accuracy using structured knowledge.
Findings
HyperEmo-RAG outperforms existing methods on multiple datasets.
Hierarchical hyperbolic embedding improves emotion taxonomy modeling.
Structured evidence injection enhances fine-grained emotion recognition.
Abstract
Multimodal emotion recognition aims to integrate text, audio, and video sources to understand human affective states. Although multimodal large language models excel at multimodal reasoning, they typically treat emotion categories as independent labels, ignoring the rich hierarchical taxonomy of human psychology. Moreover, lacking external contextual knowledge makes them highly susceptible to over-interpreting noisy cues, further complicating fine-grained emotion classification. To address these issues, we propose \textbf{HyperEmo-RAG}, a retrieval-augmented generation framework that leverages a structured emotional knowledge base. Our framework introduces two key innovations. 1) Hierarchical hyperbolic grounding. Recognizing the inherent hierarchical tree structure of emotion taxonomies, we jointly embed hierarchical emotion labels and multimodal samples into a continuous hyperbolic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
