Codified Finite-state Machines for Role-playing
Letian Peng, Yupeng Hou, Kun Zhou, Jingbo Shang

TL;DR
This paper introduces Codified Finite-State Machines (CFSMs) and CPFSMs, leveraging LLMs to automatically generate interpretable, probabilistic state models from character profiles for improved role-playing consistency and variability.
Contribution
The paper presents a novel framework for automatically creating finite-state machines from textual profiles using LLMs, enhancing role-playing with interpretable and probabilistic state modeling.
Findings
CFSMs outperform baseline methods in structured role-playing tasks.
CPFSMs effectively model uncertainty and variability in open-ended scenarios.
Both models demonstrate superior performance in synthetic and real-world evaluations.
Abstract
Modeling latent character states is crucial for consistent and engaging role-playing (RP) with large language models (LLMs). Yet, existing prompting-based approaches mainly capture surface actions, often failing to track the latent states that drive interaction. We revisit finite-state machines (FSMs), long used in game design to model state transitions. While effective in small, well-specified state spaces, traditional hand-crafted, rule-based FSMs struggle to adapt to the open-ended semantic space of RP. To address this, we introduce Codified Finite-State Machines (CFSMs), a framework that automatically codifies textual character profiles into FSMs using LLM-based coding. CFSMs extract key states and transitions directly from the profile, producing interpretable structures that enforce character consistency. To further capture uncertainty and variability, we extend CFSMs into Codified…
Peer Reviews
Decision·ICLR 2026 Poster
- The codification of character logic via FSMs, driven by LLMs, presents a novel mechanism to preserve behavioral coherence in long-form role-playing. - Experimental results show a clear improvement in behavioral consistency after introducing CFSM. Whether in synthetic tasks (e.g., Mario state transitions) or real narrative scenarios, characters’ state transitions become more coherent and believable. CFSM and CPFSM effectively reduce the confusion and inconsistency commonly observed in prompt-b
The proposed framework heavily depends on the LLM to extract states and generate transition rules. If the LLM-produced code contains errors or omissions, it may compromise the correctness of the resulting finite-state machine. The paper provides limited discussion on how to validate or correct the logic generated by the LLM, leaving the reliability of the approach partially contingent on the quality of the LLM’s rule extraction process. Another concern lies in the current evaluation, which prim
1) The described methods work on various artifacts mentioned in the results, while demonstrating the strong performance against the baselines. 2) The paper mentions the computational complexity for the both methods and shows faster and efficient codification for the proposed methods. 3) This paper includes a very detailed analysis section mentioning synthetic and real plot experiments, and is tested with multiple LLM models and techniques, and has various kind of plots and scenes from various ge
1) The “preliminary and denotation” introduces the necessary terminology but lacks examples and a lucid explanation, which can be really helpful for the readers and the general audience unaware of such methods. 2) The multi-modality and reactions of CPFSM lack depth and can be explained more clearly. 3) The real plot experience can briefly explain one of the artifacts used in the work as a running example. Not having this makes it lless intuitive for new readers.
1. Interpretability: The framework brings interpretability to state modeling in RP with executable, codified transitions derived directly from character profiles. 2. Probabilistic Extension: The CPFSM mechanism elegantly integrates stochasticity into state transitions, explicitly modeling uncertainty in RP. 3. Efficiency: CFSM delivers both accuracy and efficiency, as highlighted in Table 5.
1. Evaluation Scope (Generality): Empirical testing relies primarily on the Fandom Benchmark and three synthetic state machines. The real-world scenarios are derived from highly narrativized, structured data (Fandom plots) with limited diversity of state-space complexity and ambiguity. GPT-4.1 is both judge and model in several settings, and open-ended role-play evaluations rely heavily on LLM judgment. There is insufficient third-party or human evaluation of RP quality, which may limit claims o
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Multimodal Machine Learning Applications · Human Motion and Animation
