CAST: Character-and-Scene Episodic Memory for Agents
Kexin Ma, Bojun Li, Yuhua Tang, Liting Sun, Ruochun Jin

TL;DR
CAST introduces a dual memory system combining character-scene episodic memory with semantic memory, inspired by dramatic theory, to improve coherent event recall in agents, especially for time-sensitive conversational tasks.
Contribution
The paper presents CAST, a novel memory architecture that models episodic events as 3D scenes and character profiles, enhancing coherence and retrieval in agent memory systems.
Findings
Improved 8.11% F1 score over baselines.
Enhanced 10.21% J score in LLM-as-a-Judge evaluations.
Effective on open, time-sensitive conversational datasets.
Abstract
Episodic memory is a central component of human memory, which refers to the ability to recall coherent events grounded in who, when, and where. However, most agent memory systems only emphasize semantic recall and treat experience as structures such as key-value, vector, or graph, which makes them struggle to represent and retrieve coherent events. To address this challenge, we propose a Character-and-Scene based memory architecture(CAST) inspired by dramatic theory. Specifically, CAST constructs 3D scenes (time/place/topic) and organizes them into character profiles that summarize the events of a character to represent episodic memory. Moreover, CAST complements this episodic memory with a graph-based semantic memory, which yields a robust dual memory design. Experiments demonstrate that CAST has averagely improved 8.11% F1 and 10.21% J(LLM-as-a-Judge) than baselines on various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Games · Social Robot Interaction and HRI
