SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generation

Shanwen Tan; Hao Li; Jingtao Zhang; Xiaosong Jia; Xue Yang; Shaofeng Zhang; Yanyong Zhang

arXiv:2605.09442·cs.CV·May 12, 2026

SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generation

Shanwen Tan, Hao Li, Jingtao Zhang, Xiaosong Jia, Xue Yang, Shaofeng Zhang, Yanyong Zhang

PDF

1 Repo

TL;DR

SWIFT is a training-free framework that enables efficient, coherent multi-prompt long-video generation by adaptive memory management and semantic injection, significantly reducing inference costs.

Contribution

It introduces a novel Semantic Injection Cache, head-wise semantic injection, and an Adaptive Dynamic Window for improved efficiency and semantic coherence in long-video diffusion models.

Findings

01

Achieves 22.6 FPS on a single H100 GPU.

02

Preserves generation quality compared to state-of-the-art methods.

03

Reduces average inference cost through adaptive memory allocation.

Abstract

Streaming long-video generation faces a central challenge in continuous semantic switching, requiring adaptive memory to preserve coherent visual evolution. Current approaches rely on cache rebuilding at prompt boundaries or fixed memory budgets, but they introduce redundant computation and limit flexible semantic adaptation. This limitation arises from a mismatch between cached video history and prompt updates, as memory preserves visual continuity while prompt switches demand rapid semantic adaptation. Motivated by this observation, we present SWIFT, Semantic Windowing and Injection for Flexible Transitions, a training-free framework for multi-prompt long-video generation that enables efficient semantic switching while preserving temporal coherence in causal video diffusion models. SWIFT introduces a lightweight Semantic Injection Cache that augments cached video memory rather than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ShanwenTan/SWIFT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.