Digi-Q: Learning Q-Value Functions for Training Device-Control Agents

Hao Bai; Yifei Zhou; Li Erran Li; Sergey Levine; Aviral Kumar

arXiv:2502.15760·cs.LG·February 25, 2025

Digi-Q: Learning Q-Value Functions for Training Device-Control Agents

Hao Bai, Yifei Zhou, Li Erran Li, Sergey Levine, Aviral Kumar

PDF

Open Access 1 Repo

TL;DR

Digi-Q introduces a scalable offline RL approach using VLM-based Q-functions for device control, improving performance and reducing the need for environment interaction in dynamic, real-world settings.

Contribution

The paper presents Digi-Q, a novel method for training Q-functions with frozen VLM features via offline TD learning, enabling effective policy extraction without environment interaction.

Findings

01

Digi-Q achieves 21.2% improvement over prior methods.

02

It matches state-of-the-art RL methods in some cases.

03

The approach enhances scalability and reduces compute requirements.

Abstract

While a number of existing approaches for building foundation model agents rely on prompting or fine-tuning with human demonstrations, it is not sufficient in dynamic environments (e.g., mobile device control). On-policy reinforcement learning (RL) should address these limitations, but collecting actual rollouts in an environment is often undesirable in truly open-ended agentic problems such as mobile device control or interacting with humans, where each unit of interaction is associated with a cost. In such scenarios, a method for policy learning that can utilize off-policy experience by learning a trained action-value function is much more effective. In this paper, we develop an approach, called Digi-Q, to train VLM-based action-value Q-functions which are then used to extract the agent policy. We study our approach in the mobile device control setting. Digi-Q trains the Q-function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

digirl-agent/digiq
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning