Physics-based phenomenological characterization of cross-modal bias in multimodal models
Hyeongmo Kim, Sohyun Kang, Yerin Choi, Seungyeon Ji, Junhyuk Woo, Hyunsuk Chung, Soyeon Caren Han, Kyungreem Han

TL;DR
This paper introduces a physics-inspired phenomenological approach to analyze and understand cross-modal bias in multimodal large language models, revealing how multimodal inputs can reinforce modality dominance.
Contribution
It develops a surrogate physics-based model to analyze transformer dynamics and cross-modal bias, providing a novel perspective beyond traditional embedding analyses.
Findings
Multimodal inputs can reinforce modality dominance.
Structured error-attractor patterns reveal bias reinforcement.
Dynamical analysis shows complex interaction effects.
Abstract
The term 'algorithmic fairness' is used to evaluate whether AI models operate fairly in both comparative (where fairness is understood as formal equality, such as "treat like cases as like") and non-comparative (where unfairness arises from the model's inaccuracy, arbitrariness, or inscrutability) contexts. Recent advances in multimodal large language models (MLLMs) are breaking new ground in multimodal understanding, reasoning, and generation; however, we argue that inconspicuous distortions arising from complex multimodal interaction dynamics can lead to systematic bias. The purpose of this position paper is twofold: first, it is intended to acquaint AI researchers with phenomenological explainable approaches that rely on the physical entities that the machine experiences during training/inference, as opposed to the traditional cognitivist symbolic account or metaphysical approaches;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbodied and Extended Cognition · Action Observation and Synchronization · Language and cultural evolution
