TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models
Jaewoo Ahn, Taehyun Lee, Junyoung Lim, Jin-Hwa Kim, Sangdoo Yun,, Hwaran Lee, Gunhee Kim

TL;DR
This paper introduces TimeChara, a benchmark for evaluating point-in-time character hallucination in role-playing LLMs, revealing current models' limitations and proposing Narrative-Experts to mitigate hallucinations.
Contribution
The paper presents a new benchmark, TimeChara, for assessing point-in-time hallucination and proposes Narrative-Experts, a method to reduce such hallucinations in role-playing language models.
Findings
Current LLMs exhibit significant point-in-time hallucination issues.
TimeChara contains 10,895 instances for evaluation.
Narrative-Experts effectively reduce hallucinations.
Abstract
While Large Language Models (LLMs) can serve as agents to simulate human behaviors (i.e., role-playing agents), we emphasize the importance of point-in-time role-playing. This situates characters at specific moments in the narrative progression for three main reasons: (i) enhancing users' narrative immersion, (ii) avoiding spoilers, and (iii) fostering engagement in fandom role-playing. To accurately represent characters at specific time points, agents must avoid character hallucination, where they display knowledge that contradicts their characters' identities and historical timelines. We introduce TimeChara, a new benchmark designed to evaluate point-in-time character hallucination in role-playing LLMs. Comprising 10,895 instances generated through an automated pipeline, this benchmark reveals significant hallucination issues in current state-of-the-art LLMs (e.g., GPT-4o). To counter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
