Sparse ActionGen: Accelerating Diffusion Policy with Real-time Pruning

Kangye Ji; Jianbo Zhou; Yuan Meng; Ye Li; Hanyun Cui; Zhi Wang

arXiv:2601.12894·cs.RO·May 18, 2026

Sparse ActionGen: Accelerating Diffusion Policy with Real-time Pruning

Kangye Ji, Jianbo Zhou, Yuan Meng, Ye Li, Hanyun Cui, Zhi Wang

PDF

1 Repo 3 Reviews

TL;DR

Sparse ActionGen (SAG) accelerates diffusion-based action generation for robots by using adaptive pruning and activation reuse, achieving up to 4x speedup in real-time control without performance loss.

Contribution

SAG introduces a rollout-adaptive prune-then-reuse mechanism and an environment-aware diffusion pruner for efficient, real-time sparse action generation in robotic applications.

Findings

01

Achieves up to 4× speedup in action generation.

02

Maintains performance while accelerating diffusion process.

03

Demonstrates effectiveness on multiple robotic benchmarks.

Abstract

Diffusion Policy has dominated action generation due to its strong capabilities for modeling multi-modal action distributions, but its multi-step denoising processes make it impractical for real-time visuomotor control. Existing caching-based acceleration methods typically rely on $static$ schedules that fail to adapt to the $dynamics$ of robot-environment interactions, thereby leading to suboptimal performance. In this paper, we propose $\underline{S}$ parse $\underline{A}$ ction $\underline{G}$ en ( $SAG$ ) for extremely sparse action generation. To accommodate the iterative interactions, SAG customizes a rollout-adaptive prune-then-reuse mechanism that first identifies prunable computations globally and then reuses cached activations to substitute them during action diffusion. To capture the rollout dynamics, SAG parameterizes an…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

All design choices seem reasonable and are properly ablated, and overall improvement over other diffusion pruning methods seems substantial.

Weaknesses

- I found the motivation in the abstract and introduction at odds with the actual algorithm. The authors motivated the study of Diffusion policies by their ability to model multi-modal distributions. Since many deep RL algorithms' exploration heuristics use stochastic policies, it seems indeed important to have policies that can model a wider range of distributions. However, the authors only consider behavioral cloning of an expert policy, and while multi-modal and stochastic policies might be h

Reviewer 02Rating 6Confidence 4

Strengths

1. The paper addresses a **highly important and timely research problem**, especially as diffusion models continue to gain prominence in **imitation learning**, **reinforcement learning**, and **Vision-Language-Action (VLA)** modeling. 2. The paper presents **extensive simulation experiments**, offering strong empirical evidence for the effectiveness and robustness of the proposed method. 3. The paper is **well-written**, **clearly structured**, and **easy to follow**, effectively communicatin

Weaknesses

### Major Weakness: 1. The authors are strongly encouraged to include comparisons with traditional **diffusion acceleration methods**, such as **DDIM** or **Consistency Policy**, to enhance the **completeness** and **thoroughness** of the paper’s experimental evaluation. 2. It is noted that in RoboMimic tasks, SAG even achieves higher performance compared to Diffusion Policy with full denoising process. The authors are highly recommended to dive deeper into this phenomenon instead of just conc

Reviewer 03Rating 4Confidence 3

Strengths

1. Clear problem framing and motivation. The paper grounds the latency issue of diffusion policies in realistic control frequencies (e.g., 50 steps × 1 ms ≈ 50 ms → 20 Hz on RTX 4090; insufficient for Franka 50–1000 Hz), which is a compelling, concrete rationale for acceleration beyond image generation settings. 2. Methodological novelty: observation-conditioned, real-time pruning. The real-time diffusion pruner predicts a binary mask for all K timesteps and 3L blocks in a single forward pass

Weaknesses

1. Lack of real-robot validation. All evaluations appear to be simulation-based (RoboMimic tasks, Franka Kitchen). For claims of real-time control, a small-scale hardware validation (latency stability, sensor noise, control jitter) would substantially strengthen the case. 2. Runtime analysis is mostly relative; absolute latencies are under-reported. While speedup factors are clear, the paper would benefit from absolute inference time per control step (ms) and achieved control frequency (Hz) fo

Code & Models

Repositories

https://sparse-actiongen.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Human Motion and Animation