A Survey of Theory of Mind in Large Language Models: Evaluations, Representations, and Safety Risks
Hieu Minh "Jord" Nguyen

TL;DR
This survey reviews how large language models exhibit Theory of Mind, discusses associated safety risks, and proposes future research directions for evaluation and mitigation of these risks.
Contribution
It provides a comprehensive overview of ToM in LLMs, highlighting safety concerns and suggesting new evaluation and mitigation strategies.
Findings
LLMs demonstrate varying levels of Theory of Mind capabilities
Safety risks include potential misuse and unintended behavior
Future research needed for effective evaluation and risk mitigation
Abstract
Theory of Mind (ToM), the ability to attribute mental states to others and predict their behaviour, is fundamental to social intelligence. In this paper, we survey studies evaluating behavioural and representational ToM in Large Language Models (LLMs), identify important safety risks from advanced LLM ToM capabilities, and suggest several research directions for effective evaluation and mitigation of these risks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
