Dynamic Context Adaptation for Consistent Role-Playing Agents with Retrieval-Augmented Generations
Jeiyoon Park, Yongshin Han, Minseop Kim, Kisu Yang

TL;DR
This paper introduces Amadeus, a training-free framework that improves persona consistency in retrieval-augmented role-playing agents, supported by a new dataset CharacterRAG for evaluation.
Contribution
It presents a novel, training-free method for enhancing persona consistency in RAG-based RPAs and provides a comprehensive dataset for evaluation.
Findings
Amadeus significantly improves persona consistency.
The dataset CharacterRAG enables rigorous evaluation.
RAG-based RPAs can model knowledge and personality attributes.
Abstract
Building role-playing agents (RPAs) that faithfully emulate specific characters remains challenging because collecting character-specific utterances and continually updating model parameters are resource-intensive, making retrieval-augmented generation (RAG) a practical necessity. However, despite the importance of RAG, there has been little research on RAG-based RPAs. For example, we empirically find that when a persona lacks knowledge relevant to a given query, RAG-based RPAs are prone to hallucination, making it challenging to generate accurate responses. In this paper, we propose Amadeus, a training-free framework that can significantly enhance persona consistency even when responding to questions that lie beyond a character's knowledge. In addition, to underpin the development and rigorous evaluation of RAG-based RPAs, we manually construct CharacterRAG, a role-playing dataset that…
Peer Reviews
Decision·Submitted to ICLR 2026
The paper explicitly targets a common failure mode in RAG-based role-playing: when a user asks about aspects that are not explicitly in the persona, vanilla retrievers overuse low-relevance chunks and the agent hallucinates. The abstract and introduction motivate this crisply and position AMADEUS as training-free with three modules. ACTS preserves hierarchical context with empirical support that maximizes summed similarity and minimizes variance; ACTS/ATS outperform standard splitters across em
CharacterRAG contains only 15 fictional characters, and much of the persona content is mined from Namuwiki; it remains unclear how well findings transfer to real people, evolving personas. Adding non-fictional or time-varying personas would strengthen claims. While ACTS’s hierarchical extraction cost is noted (O(N)), the end-to-end latency and token/dollar costs (especially for GS/AE with large models) are not reported in detail across LLMs/datasets, limiting deployment guidance. The related w
1. AMADEUS (with ACTS, GS, AE modules) fixes RAG-based RPAs’ hallucinations and poor persona consistency in out-of-knowledge queries, outperforming traditional RAG by enhancing chunking, filtering, and attribute extraction. 2. The manually built CharacterRAG (15 characters, 976K chars, 450 QAs) removes interference (e.g., editor’s inferences) and fills the lack of dedicated RAG-based RPA evaluation resources. 3. Using 3 LLMs, 3 embedding models, 3 baselines, and covering in/out-of-knowled
1. The CharacterRAG dataset includes 15 fictional characters, but the paper does not specify their genre (e.g., anime, novel, film) or personality span (e.g., introverted vs. extroverted, heroic vs. villainous). If characters are concentrated in a single genre or share similar traits, the framework’s generalization to diverse role-playing scenarios (e.g., classical novel characters) remains unvalidated. 2. The Attribute Extractor (AE) only extracts "Belief and Value" and "Psychological Traits
i. This paper's attempt to improve the retrieval accuracy of persona information for RAG-based role-playing agents is meaningful. ii. The collection of a new dataset demonstrates the authors’ effort to empirically explore this problem and provides a potential resource for future studies.
i. The paper is poorly written, and many essential details are missing, which makes it difficult to fully understand and reproduce the proposed approach. - The description of the CharacterRAG dataset construction process lacks sufficient detail. It is unclear how the persona documents were collected and processed, how the 450 QA pairs were generated, and what standards were used to filter unqualified documents. Furthermore, the authors do not discuss any measures taken to ensure the fidelity an
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
