DPO Learning with LLMs-Judge Signal for Computer Use Agents

Man Luo; David Cobbley; Xin Su; Shachar Rosenman; Vasudev Lal; Shao-Yen Tseng; Phillip Howard

arXiv:2506.03095·cs.AI·June 4, 2025

DPO Learning with LLMs-Judge Signal for Computer Use Agents

Man Luo, David Cobbley, Xin Su, Shachar Rosenman, Vasudev Lal, Shao-Yen Tseng, Phillip Howard

PDF

Open Access

TL;DR

This paper introduces a lightweight, privacy-preserving vision-language model for GUI agents that uses an LLM-based judge to automatically evaluate training data, resulting in improved local performance on GUI tasks.

Contribution

The work presents a novel LLM-as-Judge framework for training compact GUI agents without human labels, enabling efficient, private, and scalable local inference.

Findings

01

Outperforms existing baselines on OS-World benchmark

02

Enables privacy-preserving local GUI agent operation

03

Demonstrates effective reinforcement learning data filtering

Abstract

Computer use agents (CUA) are systems that automatically interact with graphical user interfaces (GUIs) to complete tasks. CUA have made significant progress with the advent of large vision-language models (VLMs). However, these agents typically rely on cloud-based inference with substantial compute demands, raising critical privacy and scalability concerns, especially when operating on personal devices. In this work, we take a step toward privacy-preserving and resource-efficient agents by developing a lightweight vision-language model that runs entirely on local machines. To train this compact agent, we introduce an LLM-as-Judge framework that automatically evaluates and filters synthetic interaction trajectories, producing high-quality data for reinforcement learning without human annotation. Experiments on the OS-World benchmark demonstrate that our fine-tuned local model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Fuzzy Logic and Control Systems · Multi-Agent Systems and Negotiation