TL;DR
HiCache introduces a Hermite polynomial-based feature caching method that accelerates diffusion model inference by improving feature prediction accuracy, achieving significant speedups while maintaining or enhancing output quality across multiple content generation tasks.
Contribution
The paper presents a novel, training-free acceleration framework using Hermite polynomials for Gaussian-like feature prediction in diffusion models, enhancing existing caching methods.
Findings
Achieves 5.55x speedup on FLUX.1-dev with maintained or improved quality.
Enhances performance of previous caching methods, e.g., ClusCa.
Effective across text-to-image, video, and super-resolution tasks.
Abstract
Diffusion models have achieved remarkable success in content generation but often incur prohibitive computational costs due to iterative sampling. Recent feature caching methods accelerate inference via temporal extrapolation, yet can suffer quality degradation from inaccurate modeling of the complex dynamics of feature evolution. We propose HiCache (Hermite Polynomial-based Feature Cache), a training-free acceleration framework that improves feature prediction by aligning mathematical tools with empirical properties. Our key insight is that feature-derivative approximations in diffusion Transformers exhibit multivariate Gaussian characteristics, motivating the use of Hermite polynomials as a potentially optimal basis for Gaussian-correlated processes. We further introduce a dual-scaling mechanism that ensures numerical stability while preserving predictive accuracy, and is also…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper addresses a practical and underexplored efficiency bottleneck specific to diffusion-based language models, where repeated denoising iterations make KV caching more memory-intensive than in autoregressive models. This focus is well motivated and timely. 2. HiCache successfully adapts hierarchical caching concepts from systems design to the setting of diffusion language models. The integration into the diffusion pipeline is neat and minimally invasive, requiring no retraining or archi
Although HiCache is well designed and practically useful, its applicability is limited to diffusion-based language models. The caching and reuse patterns exploited here rely on the iterative refinement process of diffusion models, which differ substantially from autoregressive decoding. The evaluation focuses on throughput and memory reduction, but latency variance and system scalability are not thoroughly discussed. Diffusion inference involves synchronized denoising steps, so delayed cold-ca
The key strengths of the paper lie in its strong theoretical foundation and practical effectiveness. HiCache introduces a principled improvement over Taylor-based caching by recognizing that diffusion transformer features evolve according to approximately Gaussian dynamics. By replacing Taylor’s monomial basis with scaled Hermite polynomials, which are theoretically optimal for Gaussian-correlated processes, the method provides a mathematically sound and statistically aligned framework for featu
The main weaknesses of the paper stem from its scope, assumptions, and evaluation coverage. HiCache is designed specifically for Diffusion Transformers (DiTs) and relies heavily on the assumption that feature derivatives follow Gaussian statistics. While this is empirically validated for certain architectures like FLUX, the assumption may not hold universally across other diffusion models, such as U-Net–based or multi-modal architectures. Likewise, the framework’s reliance on Hermite polynomials
1. The paper replaces Taylor’s monomial basis with Hermite polynomials derived from Gaussian feature correlations, leveraging Karhunen–Loeve optimality and a single scaling factor σ to improve stability and accuracy. 2. HiCache preserves almost the same implementation form as TaylorSeer, merely replacing the polynomial basis and adding a few scalar evaluations, thereby allowing direct integration into any feature caching–based acceleration framework with negligible computational overhead. 3. Ext
1. The paper primarily relies on automated metrics such as PSNR, SSIM, LPIPS, and VBench. Incorporating human preference evaluations would make the assessment more convincing. 2. The paper heavily relies on the scaling factor σ to stabilize predictions, yet it lacks a principled rule or analysis on how to select or adapt σ across architectures, acceleration ratios, or polynomial orders.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
