Self-Reasoning Agentic Framework for Narrative Product Grid-Collage Generation

Minyan Luo; Yuxin Zhang; Yifei Li; Xincan Wang; Fuzhang Wu; Tong-Yee Lee; Oliver Deussen; Weiming Dong

arXiv:2604.16958·cs.CV·April 21, 2026

Self-Reasoning Agentic Framework for Narrative Product Grid-Collage Generation

Minyan Luo, Yuxin Zhang, Yifei Li, Xincan Wang, Fuzhang Wu, Tong-Yee Lee, Oliver Deussen, Weiming Dong

PDF

TL;DR

This paper introduces a self-reasoning agentic framework for generating narrative product grid collages that ensure visual consistency, storytelling coherence, and aesthetic harmony through explicit planning and iterative refinement.

Contribution

It presents a novel framework that constructs a product narrative, generates coordinated collages with shared style, and employs self-evaluation and refinement for improved quality.

Findings

01

Framework improves aesthetic quality over baselines

02

Enhances narrative richness and visual coherence

03

Iterative self-refinement leads to better results

Abstract

Narrative-driven product photography has become a prevalent paradigm in modern marketing, as coherent visual storytelling helps convey product value and establishes emotional engagement with consumers. However, existing image generation methods do not support structured narrative planning or cross-panel coordination, often resulting in weak storytelling and visual incoherence. In practice, narrative product photography is commonly presented as multi-grid collages, where multiple views or scenes jointly communicate a product narrative. To ensure visual consistency across grids and aesthetic harmony of the overall composition, we generate the collage as a single unified image rather than composing independently synthesized panels. We propose a self-reasoning agentic framework for narrative product grid collage generation. Given a product packshot and its name, the system first constructs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.