Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues

Eunsu Kim; Junyeong Park; Juhyun Oh; Kiwoong Park; Seyoung Song; A. Seza Do\u{g}ru\"oz; Alice Oh; Najoung Kim

arXiv:2510.19028·cs.CL·April 21, 2026

Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues

Eunsu Kim, Junyeong Park, Juhyun Oh, Kiwoong Park, Seyoung Song, A. Seza Do\u{g}ru\"oz, Alice Oh, Najoung Kim

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces SCRIPTS, a bilingual dataset for evaluating LLMs' ability to infer social relationships in dialogues, revealing current models' limitations especially in Korean and biases.

Contribution

The creation of SCRIPTS, a novel bilingual dataset, and an evaluation framework for assessing LLMs' social reasoning in English and Korean dialogues.

Findings

01

LLMs achieve 75-80% accuracy in English and 58-69% in Korean.

02

Models predict Unlikely relationships in 10-25% of cases.

03

Chain-of-thought prompting offers minimal benefits and can increase biases.

Abstract

As LLMs are increasingly deployed in real-world interactions, their social reasoning in interpersonal communication becomes critical. To explore their capabilities, we introduce SCRIPTS, a 1.1k-dialogue dataset in English and Korean, sourced from movie scripts and propose a social reasoning task based on SCRIPTS that evaluates the capacity of LLMs to infer the social relationships (e.g., friends, lovers) between speakers in each dialogue. Evaluating nine models on our task, current LLMs achieve around 75--80% on the English dataset and 58--69% in Korean, and models predict an Unlikely relationship in 10--25% of responses in both languages. Furthermore, we find that thinking models and chain-of-thought prompting provide minimal benefits for social reasoning and occasionally amplify social biases. In sum, there are significant limitations in current LLMs' social reasoning capabilities,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rladmstn1714/SCRIPTS
github

Datasets

EunsuKim/SCRIPTS
dataset· 106 dl
106 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.