Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues
Eunsu Kim, Junyeong Park, Juhyun Oh, Kiwoong Park, Seyoung Song, A. Seza Do\u{g}ru\"oz, Alice Oh, Najoung Kim

TL;DR
This paper introduces SCRIPTS, a bilingual dataset for evaluating LLMs' ability to infer social relationships in dialogues, revealing current models' limitations especially in Korean and biases.
Contribution
The creation of SCRIPTS, a novel bilingual dataset, and an evaluation framework for assessing LLMs' social reasoning in English and Korean dialogues.
Findings
LLMs achieve 75-80% accuracy in English and 58-69% in Korean.
Models predict Unlikely relationships in 10-25% of cases.
Chain-of-thought prompting offers minimal benefits and can increase biases.
Abstract
As LLMs are increasingly deployed in real-world interactions, their social reasoning in interpersonal communication becomes critical. To explore their capabilities, we introduce SCRIPTS, a 1.1k-dialogue dataset in English and Korean, sourced from movie scripts and propose a social reasoning task based on SCRIPTS that evaluates the capacity of LLMs to infer the social relationships (e.g., friends, lovers) between speakers in each dialogue. Evaluating nine models on our task, current LLMs achieve around 75--80% on the English dataset and 58--69% in Korean, and models predict an Unlikely relationship in 10--25% of responses in both languages. Furthermore, we find that thinking models and chain-of-thought prompting provide minimal benefits for social reasoning and occasionally amplify social biases. In sum, there are significant limitations in current LLMs' social reasoning capabilities,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
