Opening the Black Box: Preliminary Insights into Affective Modeling in Multimodal Foundation Models

Zhen Zhang; Runhao Zeng; Sicheng Zhao; Xiping Hu

arXiv:2601.15906·cs.CV·January 23, 2026

Opening the Black Box: Preliminary Insights into Affective Modeling in Multimodal Foundation Models

Zhen Zhang, Runhao Zeng, Sicheng Zhao, Xiping Hu

PDF

Open Access

TL;DR

This paper investigates how large multimodal foundation models internally represent emotions, revealing that affective understanding is primarily localized in the feed-forward gating projection, which can be efficiently tuned for affective tasks.

Contribution

It provides the first systematic mechanistic analysis showing that affective modeling in multimodal models centers on the gate extunderscore proj component, not attention modules.

Findings

01

Affective adaptation localizes to the feed-forward gating projection ( exttt{gate extunderscore proj})

02

Tuning approximately 24.5 extbackslash% of parameters achieves 96.6 extbackslash% of full affective task performance

03

Feed-forward gating mechanisms are structurally central to affective understanding in foundation models

Abstract

Understanding where and how emotions are represented in large-scale foundation models remains an open problem, particularly in multimodal affective settings. Despite the strong empirical performance of recent affective models, the internal architectural mechanisms that support affective understanding and generation are still poorly understood. In this work, we present a systematic mechanistic study of affective modeling in multimodal foundation models. Across multiple architectures, training strategies, and affective tasks, we analyze how emotion-oriented supervision reshapes internal model parameters. Our results consistently reveal a clear and robust pattern: affective adaptation does not primarily focus on the attention module, but instead localizes to the feed-forward gating projection (\texttt{gate\_proj}). Through controlled module transfer, targeted single-module adaptation, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications