PromptTea: Let Prompts Tell TeaCache the Optimal Threshold

Zishen Huang; Chunyu Yang; Mengyuan Ren

arXiv:2507.06739·cs.CV·July 10, 2025

PromptTea: Let Prompts Tell TeaCache the Optimal Threshold

Zishen Huang, Chunyu Yang, Mengyuan Ren

PDF

Open Access

TL;DR

PromptTea introduces an adaptive caching method that uses scene complexity from prompts to optimize reuse thresholds, significantly accelerating video generation without quality loss.

Contribution

The paper proposes Prompt-Complexity-Aware caching and DynCFGCache, novel methods for adaptive and dynamic reuse in video generation, improving speed and robustness.

Findings

01

Achieves 2.79x speedup on Wan2.1 model

02

Maintains high visual fidelity across diverse scenes

03

Enhances caching robustness with scene-aware adjustments

Abstract

Despite recent progress in video generation, inference speed remains a major bottleneck. A common acceleration strategy involves reusing model outputs via caching mechanisms at fixed intervals. However, we find that such fixed-frequency reuse significantly degrades quality in complex scenes, while manually tuning reuse thresholds is inefficient and lacks robustness. To address this, we propose Prompt-Complexity-Aware (PCA) caching, a method that automatically adjusts reuse thresholds based on scene complexity estimated directly from the input prompt. By incorporating prompt-derived semantic cues, PCA enables more adaptive and informed reuse decisions than conventional caching methods. We also revisit the assumptions behind TeaCache and identify a key limitation: it suffers from poor input-output relationship modeling due to an oversimplified prior. To overcome this, we decouple the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Image Enhancement Techniques