EVE: A Generator-Verifier System for Generative Policies
Yusuf Ali, Gryphon Patlin, Karthik Kothuri, Muhammad Zubair Irshad, Wuwei Liang, Zsolt Kira

TL;DR
EVE is a modular framework that enhances pretrained generative policies for embodied control by employing zero-shot visual verifier agents at test time, significantly improving task success without additional training.
Contribution
Introduces EVE, a generator-verifier system that boosts generative policy performance through test-time verification with zero-shot visual language models, without retraining.
Findings
Consistently improves task success rates across manipulation tasks.
Effectively integrates multiple verifier agents for action refinement.
Provides practical guidelines for scalable generator-verifier system design.
Abstract
Visuomotor policies based on generative architectures such as diffusion and flow-based matching have shown strong performance but degrade under distribution shifts, demonstrating limited recovery capabilities without costly finetuning. In the language modeling domain, test-time compute scaling has revolutionized reasoning capabilities of modern LLMs by leveraging additional inference-time compute for candidate solution refinement. These methods typically leverage foundation models as verification modules in a zero-shot manner to synthesize improved candidate solutions. In this work, we hypothesize that generative policies can similarly benefit from additional inference-time compute that employs zero-shot VLM-based verifiers. A systematic analysis of improving policy performance through the generation-verification framework remains relatively underexplored in the current literature. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Robot Manipulation and Learning
