AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making
Wenbo Li, Shiyi Wang, Yiteng Chen, Huiping Zhuang, Qingyao Wu

TL;DR
AntiGrounding introduces a novel framework that elevates robotic actions into VLM space for improved decision making, enabling zero-shot trajectory synthesis and leveraging past experience for better long-term performance.
Contribution
It reverses instruction grounding to directly lift actions into VLM space, uses multi-view rendering and visual QA for decision making, and incorporates offline policy refinement.
Findings
Outperforms baselines in simulation and real-world tasks
Enables zero-shot synthesis of robot trajectories
Improves long-term performance through offline refinement
Abstract
Vision-Language Models (VLMs) encode knowledge and reasoning capabilities for robotic manipulation within high-dimensional representation spaces. However, current approaches often project them into compressed intermediate representations, discarding important task-specific information such as fine-grained spatial or semantic details. To address this, we propose AntiGrounding, a new framework that reverses the instruction grounding process. It lifts candidate actions directly into the VLM representation space, renders trajectories from multiple views, and uses structured visual question answering for instruction-based decision making. This enables zero-shot synthesis of optimal closed-loop robot trajectories for new tasks. We also propose an offline policy refinement module that leverages past experience to enhance long-term performance. Experiments in both simulation and real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Data Visualization and Analytics
