MIST: Towards Multi-dimensional Implicit BiaS Evaluation of LLMs for Theory of Mind
Yanlin Li, Hao Liu, Huimin Liu, Kun Wang, Yinwei Wei, Yupeng Hu

TL;DR
This paper introduces MIST, a framework for evaluating the multidimensional implicit biases related to Theory of Mind in Large Language Models, using indirect tests to uncover subtle stereotypes.
Contribution
MIST offers a novel multidimensional approach to assess implicit biases in LLMs related to Theory of Mind, using indirect tasks to reveal complex stereotype structures.
Findings
Reveals complex bias structures in LLMs
Improves robustness of bias detection
Demonstrates effectiveness across eight state-of-the-art models
Abstract
Theory of Mind (ToM) in Large Language Models (LLMs) refers to the model's ability to infer the mental states of others, with failures in this ability often manifesting as systemic implicit biases. Assessing this challenge is difficult, as traditional direct inquiry methods are often met with refusal to answer and fail to capture its subtle and multidimensional nature. Therefore, we propose MIST, which reconceptualizes the content model of stereotypes into multidimensional failures of ToM, specifically in the domains of competence, sociability, and morality. The framework introduces two indirect tasks. The Word Association Bias Test (WABT) assesses implicit lexical associations, while the Affective Attribution Test (AAT) measures implicit emotional tendencies, aiming to uncover latent stereotypes without triggering model avoidance. Through extensive experimentation on eight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
