A Survey of Theory of Mind in Large Language Models: Evaluations,   Representations, and Safety Risks

Hieu Minh "Jord" Nguyen

arXiv:2502.06470·cs.CL·February 11, 2025

A Survey of Theory of Mind in Large Language Models: Evaluations, Representations, and Safety Risks

Hieu Minh "Jord" Nguyen

PDF

Open Access

TL;DR

This survey reviews how large language models exhibit Theory of Mind, discusses associated safety risks, and proposes future research directions for evaluation and mitigation of these risks.

Contribution

It provides a comprehensive overview of ToM in LLMs, highlighting safety concerns and suggesting new evaluation and mitigation strategies.

Findings

01

LLMs demonstrate varying levels of Theory of Mind capabilities

02

Safety risks include potential misuse and unintended behavior

03

Future research needed for effective evaluation and risk mitigation

Abstract

Theory of Mind (ToM), the ability to attribute mental states to others and predict their behaviour, is fundamental to social intelligence. In this paper, we survey studies evaluating behavioural and representational ToM in Large Language Models (LLMs), identify important safety risks from advanced LLM ToM capabilities, and suggest several research directions for effective evaluation and mitigation of these risks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling