Geometry Guided Self-Consistency for Physical AI

Yinwei Dai,Zhuofu Chen,Lijie Yang,Ravi Netravali

arXiv:2605.08638·cs.RO·May 12, 2026

Geometry Guided Self-Consistency for Physical AI

Yinwei Dai,Zhuofu Chen,Lijie Yang,Ravi Netravali

PDF

1 Repo

TL;DR

KeyStone is a self-consistency inference method for diffusion-based physical AI models that improves task success rates by clustering multiple candidate trajectories without additional training.

Contribution

It introduces a geometry-guided, judge-free clustering approach for inference-time self-consistency in diffusion-based physical AI models.

Findings

01

Up to 13.3% improvement in task success rates.

02

No additional model training required for self-consistency.

03

Negligible latency overhead during inference.

Abstract

State-of-the-art physical AI models generate a chunk of actions per inference through diffusion or flow matching, iteratively refining an initial noise sample into an action trajectory. Because this inference process is inherently stochastic, committing to a single trajectory per round is brittle, and this brittleness compounds across the many sequential rounds that comprise a complete episode. We introduce KeyStone, an inference-time self-consistency method for diffusion-based action generation that draws $K$ candidate action chunks in parallel from a shared model context, clusters them in continuous action space, and returns the medoid of the largest cluster -- no additional model required. Two properties make this practical. First, the compact nature of action trajectories makes diffusion inference memory-bandwidth bound, leaving spare compute capacity to run $K$ chains in parallel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dywsjtu/keystone
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.