Boosting Theory-of-Mind Performance in Large Language Models via Prompting
Shima Rahimi Moghaddam, Christopher J. Honey

TL;DR
This paper evaluates and enhances the Theory-of-Mind reasoning abilities of large language models like GPT-4 through prompting techniques, showing significant improvements with in-context learning and RLHF training.
Contribution
It systematically measures ToM performance of various LLMs and demonstrates that prompt engineering and RLHF training significantly boost their reasoning accuracy.
Findings
GPT-4 achieves nearly 80% ToM accuracy in zero-shot
In-context prompts improve RLHF-trained models to over 80% accuracy
GPT-4 reaches 100% accuracy with proper prompting
Abstract
Large language models (LLMs) excel in many tasks in 2023, but they still face challenges in complex reasoning. Theory-of-mind (ToM) tasks, which require understanding agents' beliefs, goals, and mental states, are essential for common-sense reasoning involving humans, making it crucial to enhance LLM performance in this area. This study measures the ToM performance of GPT-4 and three GPT-3.5 variants (Davinci-2, Davinci-3, GPT-3.5-Turbo), and investigates the effectiveness of in-context learning in improving their ToM comprehension. We evaluated prompts featuring two-shot chain of thought reasoning and step-by-step thinking instructions. We found that LLMs trained with Reinforcement Learning from Human Feedback (RLHF) (all models excluding Davinci-2) improved their ToM accuracy via in-context learning. GPT-4 performed best in zero-shot settings, reaching nearly 80% ToM accuracy, but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Machine Learning in Materials Science
Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Test · Cosine Annealing · Linear Layer · Adam · Attention Dropout · Label Smoothing
