AttriStory: Fine-grained Attribute Realization for Visual Storytelling with Diffusion Models

Manogna Sreenivas; Rohit Kumar; Soma Biswas

arXiv:2605.20777·cs.CV·May 21, 2026

AttriStory: Fine-grained Attribute Realization for Visual Storytelling with Diffusion Models

Manogna Sreenivas, Rohit Kumar, Soma Biswas

PDF

1 Repo

TL;DR

AttriStory introduces a benchmark and a novel optimization method to improve fine-grained attribute realization, such as clothing color and textures, in visual storytelling generated by diffusion models.

Contribution

The paper presents AttriStory, a benchmark with detailed attribute specifications and a plug-and-play optimization module for enhanced attribute accuracy in visual storytelling.

Findings

01

Consistent improvements in attribute realization across baseline models.

02

The AttriLoss effectively aligns attention maps with desired attributes.

03

Seamless integration with existing story generation pipelines.

Abstract

Visual storytelling with diffusion models has made impressive strides in maintaining character consistency across narrative scenes. However, a critical gap remains: while these methods ensure a character remains consistent across scenes, they provide no systematic method to ensure if fine-grained attributes such as color and textures of clothing, accessories are faithfully rendered in the generated images. Towards this goal, we introduce AttriStory, a benchmark enabling attribute realization in visual storytelling. We curate 200 multi-scene stories across 10 distinct artistic styles using Large Language Model. Each scene is constructed with detailed attribute specifications to enable rich visual narratives. Further, to address attribute realization, we propose a plug-and-play latent optimization module that operates during early denoising steps, when the model establishes structural and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://manogna-s.github.io/attristory
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.