Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis
Tianyi Song, Jiuxin Cao, Kun Wang, Bo Liu, Xiaofeng Zhang

TL;DR
Causal-Story introduces a local causal attention mechanism in diffusion models to improve visual story synthesis by weighting historical information based on causal relevance, leading to more coherent and consistent story generation.
Contribution
It presents a novel causal attention approach that dynamically weights historical captions and frames, enhancing visual story synthesis beyond existing methods.
Findings
Achieved state-of-the-art FID scores on PororoSV and FlintstonesSV datasets.
Generated frames show improved storytelling coherence and visual quality.
Model effectively captures causal relationships for better global consistency.
Abstract
The excellent text-to-image synthesis capability of diffusion models has driven progress in synthesizing coherent visual stories. The current state-of-the-art method combines the features of historical captions, historical frames, and the current captions as conditions for generating the current frame. However, this method treats each historical frame and caption as the same contribution. It connects them in order with equal weights, ignoring that not all historical conditions are associated with the generation of the current frame. To address this issue, we propose Causal-Story. This model incorporates a local causal attention mechanism that considers the causal relationship between previous captions, frames, and current captions. By assigning weights based on this relationship, Causal-Story generates the current frame, thereby improving the global consistency of story generation. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Video Analysis and Summarization
MethodsDiffusion
