Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning
Chengshuai Shi, Wenzhe Li, Xinran Liang, Yizhou Lu, Wenjia Yang, Ruirong Feng, Seth Karten, Ziran Yang, Zihan Ding, Gabriel Sarch, Danqi Chen, Karthik Narasimhan, Chi Jin

TL;DR
This paper introduces Odysseus, a framework that enhances vision-language models for long-horizon decision-making in video games using reinforcement learning, achieving significant progress and stability.
Contribution
It presents an adapted PPO algorithm with a turn-level critic and demonstrates the effectiveness of pretrained VLMs in long-term game environments, advancing RL training stability and sample efficiency.
Findings
Odysseus achieves at least 3 times more game progress than previous models.
Pretrained VLMs significantly improve sample efficiency and reduce manual action engineering.
The framework generalizes well across different game levels and settings.
Abstract
Given the rapidly growing capabilities of vision-language models (VLMs), extending them to interactive decision-making tasks such as video games has emerged as a promising frontier. However, existing approaches either rely on large-scale supervised fine-tuning (SFT) on human trajectories or apply reinforcement learning (RL) only in relatively short-horizon settings (typically around 20--30 turns). In this work, we study RL-based training of VLMs for long-horizon decision-making in Super Mario Land, a visually grounded environment requiring 100+ turns of interaction with coordinated perception, reasoning, and action. We begin with a systematic investigation of key algorithmic components and propose an adapted variant of PPO with a lightweight turn-level critic, which substantially improves training stability and sample efficiency over critic-free methods such as GRPO and Reinforce++. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
