OmniGuide: Universal Guidance Fields for Enhancing Generalist Robot Policies

Yunzhou Song; Long Le; Yong-Hyun Park; Jie Wang; Junyao Shi; Lingjie Liu; Jiatao Gu; Eric Eaton; Dinesh Jayaraman; Kostas Daniilidis

arXiv:2603.10052·cs.RO·March 12, 2026

OmniGuide: Universal Guidance Fields for Enhancing Generalist Robot Policies

Yunzhou Song, Long Le, Yong-Hyun Park, Jie Wang, Junyao Shi, Lingjie Liu, Jiatao Gu, Eric Eaton, Dinesh Jayaraman, Kostas Daniilidis

PDF

Open Access

TL;DR

OMNIGUIDE introduces a flexible framework that enhances vision-language-action models for complex tasks by integrating diverse guidance sources as differentiable energy functions, significantly improving performance in simulation and real-world scenarios.

Contribution

It presents a novel, unified approach to incorporate various guidance sources into VLA models, surpassing prior methods in handling complex tasks.

Findings

01

Improves success and safety rates of generalist policies.

02

Matches or exceeds performance of guidance-specific prior methods.

03

Effective in both simulation and real-world environments.

Abstract

Vision-language-action(VLA) models have shown great promise as generalist policies for a large range of relatively simple tasks. However, they demonstrate limited performance on more complex tasks, such as those requiring complex spatial or semantic understanding, manipulation in clutter, or precise manipulation. We propose OMNIGUIDE, a flexible framework that improves VLA performance on such tasks by leveraging arbitrary sources of guidance, such as 3D foundation models, semantic-reasoning VLMs, and human pose models. We show how many kinds of guidance can be naturally expressed as differentiable energy functions with task-specific attractors and repellers located in 3D space, that influence the sampling of VLA actions. In this way, OMNIGUIDE enables guidance sources with complementary task-relevant strengths to improve a VLA model's performance on challenging tasks. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning