TL;DR
DiscreteRTC leverages discrete diffusion policies for asynchronous execution in dynamic tasks, offering improved success rates, reduced latency, and simpler implementation over flow-matching RTC methods.
Contribution
The paper introduces DiscreteRTC, a novel approach that uses native unmasking in discrete diffusion policies for asynchronous execution, overcoming limitations of flow-matching policies.
Findings
Higher success rates on dynamic benchmarks and real-world tasks.
Faster inference with only 0.7x additional computation.
50% higher success rate in real-world pick tasks.
Abstract
Unlike chatbots, physical AI must act while the world keeps evolving. Therefore, the inter-chunk pause of synchronous executors are fatal for dynamic tasks regardless of how fast the inference is. Asynchronous execution -- thinking while acting -- is therefore a structural requirement, and real-time chunking (RTC) makes it viable by recasting chunk transitions as inpainting: freezing committed actions and consistently generating the remainder. However, RTC with flow-matching policy is structurally suboptimal: its inpainting comes from inference-time corrections rather than the base policy, yielding little pre-training benefit, specific fine-tuning, heuristic guidance, and extra computation that inflates the latency. In this work, we observe that discrete diffusion policies, which generate actions by iteratively unmasking, are natural asynchronous executors that resolve all limitations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
