Causal-Story: Local Causal Attention Utilizing Parameter-Efficient   Tuning For Visual Story Synthesis

Tianyi Song; Jiuxin Cao; Kun Wang; Bo Liu; Xiaofeng Zhang

arXiv:2309.09553·cs.CV·March 7, 2024

Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis

Tianyi Song, Jiuxin Cao, Kun Wang, Bo Liu, Xiaofeng Zhang

PDF

Open Access

TL;DR

Causal-Story introduces a local causal attention mechanism in diffusion models to improve visual story synthesis by weighting historical information based on causal relevance, leading to more coherent and consistent story generation.

Contribution

It presents a novel causal attention approach that dynamically weights historical captions and frames, enhancing visual story synthesis beyond existing methods.

Findings

01

Achieved state-of-the-art FID scores on PororoSV and FlintstonesSV datasets.

02

Generated frames show improved storytelling coherence and visual quality.

03

Model effectively captures causal relationships for better global consistency.

Abstract

The excellent text-to-image synthesis capability of diffusion models has driven progress in synthesizing coherent visual stories. The current state-of-the-art method combines the features of historical captions, historical frames, and the current captions as conditions for generating the current frame. However, this method treats each historical frame and caption as the same contribution. It connects them in order with equal weights, ignoring that not all historical conditions are associated with the generation of the current frame. To address this issue, we propose Causal-Story. This model incorporates a local causal attention mechanism that considers the causal relationship between previous captions, frames, and current captions. By assigning weights based on this relationship, Causal-Story generates the current frame, thereby improving the global consistency of story generation. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Video Analysis and Summarization

MethodsDiffusion