Much Ado About Noising: Dispelling the Myths of Generative Robotic Control
Chaoyi Pan, Giri Anantharaman, Nai-Chieh Huang, Claire Jin, Daniel Pfrommer, Chenyang Yuan, Frank Permenter, Guannan Qu, Nicholas Boffi, Guanya Shi, Max Simchowitz

TL;DR
This paper critically evaluates generative control policies in robotics, revealing that their success is mainly due to iterative computation with supervision and stochasticity, rather than their ability to model complex behaviors.
Contribution
It challenges the common belief that generative models' success is due to multi-modality, showing instead that iterative supervised computation is key.
Findings
Iterative computation with supervision explains GCP success.
A simple two-step policy matches or exceeds complex GCPs.
Distribution fitting is less important than control performance.
Abstract
Generative models, like flows and diffusions, have recently emerged as popular and efficacious policy parameterizations in robotics. There has been much speculation as to the factors underlying their successes, ranging from capturing multi-modal action distribution to expressing more complex behaviors. In this work, we perform a comprehensive evaluation of popular generative control policies (GCPs) on common behavior cloning (BC) benchmarks. We find that GCPs do not owe their success to their ability to capture multi-modality or to express more complex observation-to-action mappings. Instead, we find that their advantage stems from iterative computation, as long as intermediate steps are supervised during training and this supervision is paired with a suitable level of stochasticity. As a validation of our findings, we show that a minimum iterative policy (MIP), a lightweight two-step…
Peer Reviews
Decision·ICLR 2026 Poster
1. A key finding of this paper is that when using the same network architecture, traditional Regression Control Policies (RCPs) are highly competitive with modern Generative Control Policies (GCPs). This is an important contribution, as it helps correct a potential misconception in the field (stemming from earlier works that used different architectures for baselines) that diffusion policies inherently offer a large performance gain over standard behavior cloning on these tasks. 2. A major stre
1. The paper's use of the Lipschitz constant as a metric for "expressivity" is a potential point of confusion. As you noted, the Lipschitz constant measures smoothness or robustness (how much the output can change for a small input change), not necessarily the representational capacity (the class of functions a model can learn). While their point (that GCPs find smoother solutions) is valid, their terminology can be debated. 2. The paper's theoretical argument, presented as Theorem 1 (Informal)
1. I like the motivation of this work. Decomposing and analyzing the underlying reasons behind the success of generative policies is valuable, especially for RL and imitation learning, where understanding why a model works often matters more than just scaling it up. As often said, RL doesn’t necessarily benefit from ever-larger networks like LLMs or VLMs do (IMO since the right actions or behaviors come from a subtle, structured distribution that doesn’t simply improve with more data or capacit
I list both my (tentative) weaknesses and questions together here, since some of them overlap, and a few points might just come from my not fully understanding certain parts of the evaluation. 1. On the conclusion that GCPs cannot really produce multi-modal actions, I have some concerns about the setup of evidence C. Even in deterministic environments and with deterministic policies, the underlying mapping from observation to action can still be stochastic or even multi-modal at the distributio
Rigorous experimental isolation of C1, C2, C3 effects. Unusually broad benchmark coverage (27 tasks). Strong ablations: flow vs regression vs MIP. Negative result clearly demonstrated and well-reasoned. Practical takeaway: simple two-step deterministic policy matches flow performance. “Manifold adherence” metric offers a novel lens on robustness.
All evaluations are offline imitation; no fine-tuning or real-robot rollouts to confirm closed-loop stability. MIP’s hyperparameters (e.g. noise scale) are fixed; sensitivity analysis would help.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Generative Adversarial Networks and Image Synthesis
