Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games

Jingran Zhang; Ning Li; Justin Cui

arXiv:2510.26298·cs.CL·October 31, 2025

Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games

Jingran Zhang, Ning Li, Justin Cui

PDF

TL;DR

This paper evaluates OpenAI's ChatGPT Atlas in web-based games, revealing strengths in logical reasoning tasks but limitations in real-time, motor-controlled environments, highlighting its potential and current constraints.

Contribution

It provides an early empirical assessment of Atlas's web interaction capabilities in dynamic environments using browser games as test scenarios.

Findings

01

Atlas excels in logical reasoning tasks like Sudoku.

02

Struggles with real-time games requiring precise timing.

03

Performance varies significantly across different game types.

Abstract

OpenAI's ChatGPT Atlas introduces new capabilities for web interaction, enabling the model to analyze webpages, process user intents, and execute cursor and keyboard inputs directly within the browser. While its capacity for information retrieval tasks has been demonstrated, its performance in dynamic, interactive environments remains less explored. In this study, we conduct an early evaluation of Atlas's web interaction capabilities using browser-based games as test scenarios, including Google's T-Rex Runner, Sudoku, Flappy Bird, and Stein.world. We employ in-game performance scores as quantitative metrics to assess performance across different task types. Our results show that Atlas performs strongly in logical reasoning tasks like Sudoku, completing puzzles significantly faster than human baselines, but struggles substantially in real-time games requiring precise timing and motor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.