Digi-Q: Learning Q-Value Functions for Training Device-Control Agents
Hao Bai, Yifei Zhou, Li Erran Li, Sergey Levine, Aviral Kumar

TL;DR
Digi-Q introduces a scalable offline RL approach using VLM-based Q-functions for device control, improving performance and reducing the need for environment interaction in dynamic, real-world settings.
Contribution
The paper presents Digi-Q, a novel method for training Q-functions with frozen VLM features via offline TD learning, enabling effective policy extraction without environment interaction.
Findings
Digi-Q achieves 21.2% improvement over prior methods.
It matches state-of-the-art RL methods in some cases.
The approach enhances scalability and reduces compute requirements.
Abstract
While a number of existing approaches for building foundation model agents rely on prompting or fine-tuning with human demonstrations, it is not sufficient in dynamic environments (e.g., mobile device control). On-policy reinforcement learning (RL) should address these limitations, but collecting actual rollouts in an environment is often undesirable in truly open-ended agentic problems such as mobile device control or interacting with humans, where each unit of interaction is associated with a cost. In such scenarios, a method for policy learning that can utilize off-policy experience by learning a trained action-value function is much more effective. In this paper, we develop an approach, called Digi-Q, to train VLM-based action-value Q-functions which are then used to extract the agent policy. We study our approach in the mobile device control setting. Digi-Q trains the Q-function…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
