TL;DR
This paper introduces test-time sparsity to accelerate action diffusion models, reducing FLOPs by 92% and increasing inference speed fivefold without performance loss.
Contribution
It proposes a novel parallelized inference pipeline and omnidirectional feature reuse strategy to enable lossless, high-speed action diffusion with significant computational savings.
Findings
Reduced FLOPs by 92%
Achieved 5x faster inference speed
Maintained performance with 47.5 Hz inference frequency
Abstract
Action diffusion excels at high-fidelity action generation but incurs heavy computational costs owing to its iterative denoising nature. Despite current technologies showing promise in accelerating diffusion transformers by reusing the cached features, they struggle to adapt to policy dynamics arising from diverse perceptions and multi-round rollout iterations in open environments. We propose test-time sparsity to tackle this challenge, which aims to accelerate action diffusion by dynamically predicting prunable residual computations for each model forward at test time. However, two bottlenecks remain in this paradigm: 1) repetitive conditional encoding and pruning offset most potential speed gains, and 2) the features cached from previous denoising timesteps cannot constrain large pruning errors under aggressive sparsity. To address the first bottleneck, we design a highly parallelized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
