Best-of-Q: Improving VLM agents with Q-function Action Ranking at Inference
Emilien Bir\'e, Mar\'ia Santos, Kai Yuan

TL;DR
This paper introduces a novel inference-time method for improving vision-language model agents by reranking candidate actions with a lightweight Q-function, significantly boosting success rates without retraining the policy.
Contribution
The main novelty is applying a Q-function during inference to rerank actions, avoiding the need for policy retraining or fine-tuning in dynamic environments.
Findings
Qwen2.5-VL-7B agent success rate increased from 38.8% to 55.7%.
Proprietary GPT-4.1 agent success rate increased from 82.4% to 88.8%.
Method significantly improves agent performance on WebVoyager benchmark.
Abstract
Vision-Language Models (VLMs) have become powerful backbones for agents to autonomously operate in digital environments like the web and operating systems. However, these models suffer from inadaptability to fast-changing environments like the web, which can be alleviated by fine-tuning requiring expansive model training and data collection. In this work, we introduce a novel paradigm for enhancing agentic VLM policies at inference without policy retraining. Fundamentally, our approach decouples the VLM's role as a high-capacity action proposer from the final action selection mechanism. We keep the VLM policy frozen and use it to generate a set of candidate actions for a given state. Then, a lightweight, offline-trained Q-function reranks these candidates, and the agent executes the action with the highest estimated value. The main contribution is to apply the Q-function directly during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
