TL;DR
AttriStory introduces a benchmark and a novel optimization method to improve fine-grained attribute realization, such as clothing color and textures, in visual storytelling generated by diffusion models.
Contribution
The paper presents AttriStory, a benchmark with detailed attribute specifications and a plug-and-play optimization module for enhanced attribute accuracy in visual storytelling.
Findings
Consistent improvements in attribute realization across baseline models.
The AttriLoss effectively aligns attention maps with desired attributes.
Seamless integration with existing story generation pipelines.
Abstract
Visual storytelling with diffusion models has made impressive strides in maintaining character consistency across narrative scenes. However, a critical gap remains: while these methods ensure a character remains consistent across scenes, they provide no systematic method to ensure if fine-grained attributes such as color and textures of clothing, accessories are faithfully rendered in the generated images. Towards this goal, we introduce AttriStory, a benchmark enabling attribute realization in visual storytelling. We curate 200 multi-scene stories across 10 distinct artistic styles using Large Language Model. Each scene is constructed with detailed attribute specifications to enable rich visual narratives. Further, to address attribute realization, we propose a plug-and-play latent optimization module that operates during early denoising steps, when the model establishes structural and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
