Foresight: Adaptive Layer Reuse for Accelerated and High-Quality Text-to-Video Generation

Muhammad Adnan; Nithesh Kurella; Akhil Arunkumar; Prashant J. Nair

arXiv:2506.00329·cs.LG·September 24, 2025

Foresight: Adaptive Layer Reuse for Accelerated and High-Quality Text-to-Video Generation

Muhammad Adnan, Nithesh Kurella, Akhil Arunkumar, Prashant J. Nair

PDF

Open Access 1 Video

TL;DR

Foresight introduces an adaptive layer-reuse method for diffusion transformers that dynamically optimizes computational efficiency in text-to-video generation without sacrificing quality.

Contribution

It proposes a novel adaptive layer-reuse technique that adjusts to generation parameters, significantly reducing computational costs in diffusion transformer-based video synthesis.

Findings

01

Up to x speedup in video generation

02

Maintains high video quality with reduced computation

03

Effective across multiple models like OpenSora, Latte, and CogVideoX

Abstract

Diffusion Transformers (DiTs) achieve state-of-the-art results in text-to-image, text-to-video generation, and editing. However, their large model size and the quadratic cost of spatial-temporal attention over multiple denoising steps make video generation computationally expensive. Static caching mitigates this by reusing features across fixed steps but fails to adapt to generation dynamics, leading to suboptimal trade-offs between speed and quality. We propose Foresight, an adaptive layer-reuse technique that reduces computational redundancy across denoising steps while preserving baseline performance. Foresight dynamically identifies and reuses DiT block outputs for all layers across steps, adapting to generation parameters such as resolution and denoising schedules to optimize efficiency. Applied to OpenSora, Latte, and CogVideoX, Foresight achieves up to \latencyimprv end-to-end…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Foresight: Adaptive Layer Reuse for Accelerated and High-Quality Text-to-Video Generation· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Computer Graphics and Visualization Techniques