Toyteller: AI-powered Visual Storytelling Through Toy-Playing with Character Symbols
John Joon Young Chung, Melissa Roemmele, Max Kreminski

TL;DR
Toyteller is an innovative AI system enabling users to create visual stories by manipulating toy-like character symbols, combining motion and text to enhance storytelling and user interaction.
Contribution
It introduces a novel toy-playing interaction paradigm that integrates motion and text generation through shared semantic mapping, advancing human-AI storytelling interfaces.
Findings
Toyteller outperforms baseline GPT-4o in technical evaluations.
User study shows toy-playing helps express complex intentions.
Motion alone cannot capture all user intentions, suggesting multimodal approaches.
Abstract
We introduce Toyteller, an AI-powered storytelling system where users generate a mix of story text and visuals by directly manipulating character symbols like they are toy-playing. Anthropomorphized symbol motions can convey rich and nuanced social interactions; Toyteller leverages these motions (1) to let users steer story text generation and (2) as a visual output format that accompanies story text. We enabled motion-steered text generation and text-steered motion generation by mapping motions and text onto a shared semantic space so that large language models and motion generation models can use it as a translational layer. Technical evaluations showed that Toyteller outperforms a competitive baseline, GPT-4o. Our user study identified that toy-playing helps express intentions difficult to verbalize. However, only motions could not express all user intentions, suggesting combining it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
