Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation

Gordon Chen; Ziqi Huang; Ziwei Liu

arXiv:2604.10030·cs.CV·April 14, 2026

Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation

Gordon Chen, Ziqi Huang, Ziwei Liu

PDF

1 Repo

TL;DR

Prompt Relay is a plug-and-play method that improves multi-event video generation by enabling precise temporal control and reducing semantic interference without modifying the underlying diffusion model.

Contribution

It introduces a penalty in cross-attention to ensure each video segment attends only to its designated prompt, enhancing temporal alignment and visual quality.

Findings

01

Improves text-video alignment in multi-event videos.

02

Reduces semantic bleeding between different events.

03

Enhances visual quality without additional computational cost.

Abstract

Video diffusion models have achieved remarkable progress in generating high-quality videos. However, these models struggle to represent the temporal succession of multiple events in real-world videos and lack explicit mechanisms to control when semantic concepts appear, how long they persist, and the order in which multiple events occur. Such control is especially important for movie-grade video synthesis, where coherent storytelling depends on precise timing, duration, and transitions between events. When using a single paragraph-style prompt to describe a sequence of complex events, models often exhibit semantic entanglement, where concepts intended for different moments in the video bleed into one another, resulting in poor text-video alignment. To address these limitations, we propose Prompt Relay, an inference-time, plug-and-play method to enable fine-grained temporal control in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gordonchen19/Prompt-Relay
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.