How Far Can We Go with Pixels Alone? A Pilot Study on Screen-Only Navigation in Commercial 3D ARPGs
Kaijie Xu, Mustafa Bugti, Clark Verbrugge

TL;DR
This study explores the feasibility of screen-only visual navigation in complex 3D game levels, demonstrating that purely vision-based agents can navigate effectively in simplified environments but face limitations in more complex settings.
Contribution
The paper introduces a novel screen-only navigation agent using visual affordances, providing a baseline and evaluation protocol for visual navigation in complex 3D games.
Findings
Agent can traverse most required segments in pilot tests
Visual affordance model enables meaningful navigation behavior
Limitations of the visual model hinder comprehensive auto-navigation
Abstract
Modern 3D game levels rely heavily on visual guidance, yet the navigability of level layouts remains difficult to quantify. Prior work either simulates play in simplified environments or analyzes static screenshots for visual affordances, but neither setting faithfully captures how players explore complex, real-world game levels. In this paper, we build on an existing open-source visual affordance detector and instantiate a screen-only exploration and navigation agent that operates purely from visual affordances. Our agent consumes live game frames, identifies salient interest points, and drives a simple finite-state controller over a minimal action space to explore Dark Souls-style linear levels and attempt to reach expected goal regions. Pilot experiments show that the agent can traverse most required segments and exhibits meaningful visual navigation behavior, but also highlight that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Artificial Intelligence in Games · Multimodal Machine Learning Applications
