TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation

Hongyu Zhang; Yufan Deng; Zilin Pan; Peng-Tao Jiang; Bo Li; Qibin Hou; Zhiyang Dou; Zhen Dong; Daquan Zhou

arXiv:2604.19473·cs.CV·April 22, 2026

TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation

Hongyu Zhang, Yufan Deng, Zilin Pan, Peng-Tao Jiang, Bo Li, Qibin Hou, Zhiyang Dou, Zhen Dong, Daquan Zhou

PDF

1 Repo 1 Video

TL;DR

This paper introduces TS-Attn, a novel attention mechanism that improves multi-event video generation by enhancing temporal coherence and content alignment in pre-trained models.

Contribution

The paper proposes a training-free, plug-and-play attention method, TS-Attn, that enhances temporal awareness and coherence in multi-event video synthesis.

Findings

01

Boosts StoryEval-Bench scores by over 33% on Wan2.1-T2V-14B.

02

Increases inference time by only 2%.

03

Supports multi-event image-to-video generation across models.

Abstract

Generating high-quality videos from complex temporal descriptions that contain multiple sequential actions is a key unsolved problem. Existing methods are constrained by an inherent trade-off: using multiple short prompts fed sequentially into the model improves action fidelity but compromises temporal consistency, while a single complex prompt preserves consistency at the cost of prompt-following capability. We attribute this problem to two primary causes: 1) temporal misalignment between video content and the prompt, and 2) conflicting attention coupling between motion-related visual objects and their associated text conditions. To address these challenges, we propose a novel, training-free attention mechanism, Temporal-wise Separable Attention (TS-Attn), which dynamically rearranges attention distribution to ensure temporal awareness and global coherence in multi-event scenarios.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Hong-yu-Zhang/TS-Attn
github

Videos

TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation· slideslive