BOOKAGENT: Orchestrating Safety-Aware Visual Narratives via Multi-Agent Cognitive Calibration

Bo Gao; Chang Liu; Yuyang Miao; Siyuan Ma; Ser-Nam Lim

arXiv:2604.16541·cs.CV·April 21, 2026

BOOKAGENT: Orchestrating Safety-Aware Visual Narratives via Multi-Agent Cognitive Calibration

Bo Gao, Chang Liu, Yuyang Miao, Siyuan Ma, Ser-Nam Lim

PDF

1 Repo

TL;DR

BookAgent is a multi-agent framework that generates safe, coherent visual storybooks from user drafts by jointly planning, illustrating, and repairing inconsistencies, advancing multi-modal narrative creation.

Contribution

It introduces a holistic, safety-aware multi-agent system for end-to-end storybook synthesis, addressing multi-modal grounding and child-specific safety constraints.

Findings

01

Outperforms existing methods in narrative coherence and visual consistency.

02

Ensures safety compliance in multi-modal story generation.

03

Effectively calibrates multi-modal alignment and global storytelling consistency.

Abstract

Recent advancements in Large Generative Models (LGMs) have revolutionized multi-modal generation. However, generating illustrated storybooks remains an open challenge, where prior works mainly decompose this task into separate stages, and thus, holistic multi-modal grounding remains limited. Besides, while safety alignment is studied for text- or image-only generation, existing works rarely integrate child-specific safety constraints into narrative planning and sequence-level multi-modal verification. To address these limitations, we propose BookAgent, a safety-aware multi-agent collaboration framework designed for high-quality, safety-aware visual narratives. Different from prior story visualization models that assume a fixed storyline sequence, BookAgent targets end-to-end storybook synthesis from a user draft by jointly planning, scripting, illustrating, and globally repairing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bogao-code/BookAgent/tree/main
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.