TL;DR
KeyStone is a self-consistency inference method for diffusion-based physical AI models that improves task success rates by clustering multiple candidate trajectories without additional training.
Contribution
It introduces a geometry-guided, judge-free clustering approach for inference-time self-consistency in diffusion-based physical AI models.
Findings
Up to 13.3% improvement in task success rates.
No additional model training required for self-consistency.
Negligible latency overhead during inference.
Abstract
State-of-the-art physical AI models generate a chunk of actions per inference through diffusion or flow matching, iteratively refining an initial noise sample into an action trajectory. Because this inference process is inherently stochastic, committing to a single trajectory per round is brittle, and this brittleness compounds across the many sequential rounds that comprise a complete episode. We introduce KeyStone, an inference-time self-consistency method for diffusion-based action generation that draws candidate action chunks in parallel from a shared model context, clusters them in continuous action space, and returns the medoid of the largest cluster -- no additional model required. Two properties make this practical. First, the compact nature of action trajectories makes diffusion inference memory-bandwidth bound, leaving spare compute capacity to run chains in parallel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
