Opening the Black Box: Preliminary Insights into Affective Modeling in Multimodal Foundation Models
Zhen Zhang, Runhao Zeng, Sicheng Zhao, Xiping Hu

TL;DR
This paper investigates how large multimodal foundation models internally represent emotions, revealing that affective understanding is primarily localized in the feed-forward gating projection, which can be efficiently tuned for affective tasks.
Contribution
It provides the first systematic mechanistic analysis showing that affective modeling in multimodal models centers on the gate extunderscore proj component, not attention modules.
Findings
Affective adaptation localizes to the feed-forward gating projection ( exttt{gate extunderscore proj})
Tuning approximately 24.5 extbackslash% of parameters achieves 96.6 extbackslash% of full affective task performance
Feed-forward gating mechanisms are structurally central to affective understanding in foundation models
Abstract
Understanding where and how emotions are represented in large-scale foundation models remains an open problem, particularly in multimodal affective settings. Despite the strong empirical performance of recent affective models, the internal architectural mechanisms that support affective understanding and generation are still poorly understood. In this work, we present a systematic mechanistic study of affective modeling in multimodal foundation models. Across multiple architectures, training strategies, and affective tasks, we analyze how emotion-oriented supervision reshapes internal model parameters. Our results consistently reveal a clear and robust pattern: affective adaptation does not primarily focus on the attention module, but instead localizes to the feed-forward gating projection (\texttt{gate\_proj}). Through controlled module transfer, targeted single-module adaptation, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications
