DigiRL: Training In-The-Wild Device-Control Agents with Autonomous   Reinforcement Learning

Hao Bai; Yifei Zhou; Mert Cemri; Jiayi Pan; Alane Suhr; Sergey Levine,; Aviral Kumar

arXiv:2406.11896·cs.LG·June 20, 2024·2 cites

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

Hao Bai, Yifei Zhou, Mert Cemri, Jiayi Pan, Alane Suhr, Sergey Levine,, Aviral Kumar

PDF

Open Access 1 Repo 1 Video

TL;DR

DigiRL introduces a novel autonomous reinforcement learning method to train device control agents in real-world GUIs, significantly outperforming previous supervised and RL approaches by leveraging a two-stage offline and online training process.

Contribution

The paper presents DigiRL, a new two-stage RL framework that fine-tunes pre-trained vision-language models for in-the-wild device control, addressing real-world stochasticity and non-stationarity.

Findings

01

Achieved 49.5% success rate on Android in-the-wild tasks, a substantial improvement over prior methods.

02

Outperformed GPT-4V, CogAgent, and previous RL approaches in success rate.

03

Established a new state-of-the-art for in-the-wild device control agents.

Abstract

Training corpuses for vision language models (VLMs) typically lack sufficient amounts of decision-centric data. This renders off-the-shelf VLMs sub-optimal for decision-making tasks such as in-the-wild device control through graphical user interfaces (GUIs). While training with static demonstrations has shown some promise, we show that such methods fall short for controlling real GUIs due to their failure to deal with real-world stochasticity and non-stationarity not captured in static observational data. This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents through fine-tuning a pre-trained VLM in two stages: offline RL to initialize the model, followed by offline-to-online RL. To do this, we build a scalable and parallelizable Android learning environment equipped with a VLM-based evaluator and develop a simple yet…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

digirl-agent/digirl
pytorch

Videos

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning· slideslive

Taxonomy

TopicsData Stream Mining Techniques · Reinforcement Learning in Robotics