StoryState: Agent-Based State Control for Consistent and Editable Storybooks
Ayushman Sarkar, Zhenyu Yu, Wei Tang, Chu Chen, Kangning Cui, Mohd Yamani Idna Idris

TL;DR
StoryState introduces an agent-based system that explicitly manages story state for multi-page storybook generation, enabling more consistent, editable, and localized edits across pages using prompt-based control.
Contribution
It presents a novel agent-based orchestration layer that explicitly models story state, improving multi-page story editing and consistency without retraining models.
Findings
Enables localized page edits with minimal unintended changes.
Improves cross-page visual consistency.
Reduces editing time compared to prior methods.
Abstract
Large multimodal models have enabled one-click storybook generation, where users provide a short description and receive a multi-page illustrated story. However, the underlying story state, such as characters, world settings, and page-level objects, remains implicit, making edits coarse-grained and often breaking visual consistency. We present StoryState, an agent-based orchestration layer that introduces an explicit and editable story state on top of training-free text-to-image generation. StoryState represents each story as a structured object composed of a character sheet, global settings, and per-page scene constraints, and employs a small set of LLM agents to maintain this state and derive 1Prompt1Story-style prompts for generation and editing. Operating purely through prompts, StoryState is model-agnostic and compatible with diverse generation backends. System-level experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Games · Human Motion and Animation
