Scaling Laws for Imitation Learning in Single-Agent Games
Jens Tuyls, Dhruv Madeka, Kari Torkkola, Dean Foster, Karthik, Narasimhan, Sham Kakade

TL;DR
This paper investigates how scaling up model and data size in imitation learning improves performance in single-agent games, demonstrating power-law scaling and outperforming previous methods in challenging environments like NetHack.
Contribution
It provides the first detailed analysis of scaling laws in imitation learning for single-agent games, showing that larger models and datasets lead to significant performance gains.
Findings
IL loss and mean return scale smoothly with compute (FLOPs)
Power laws describe the relationship between compute and performance
Scaled IL agents outperform prior state-of-the-art in NetHack by 1.5x
Abstract
Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, many works find it is often unable to fully recover the underlying expert behavior, even in constrained environments like single-agent games. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where "scaling up" has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games. We first demonstrate our findings on a variety of Atari games, and thereafter focus on the extremely challenging game of NetHack. In all games, we find that IL loss and mean return scale smoothly with the compute budget (FLOPs) and are strongly correlated, resulting in power laws for…
Peer Reviews
Decision·Submitted to ICLR 2024
- The objective of the paper -- studying scaling laws in BC -- is timely and I'm sure would capture the interest of many researchers in and around imitation learning. - The paper presents a suite of experiments, and subsequent analysis, that required substantial effort to set up, and significant compute to run. I see value in sharing this with other researchers. - Presentation is mostly clear. The main results, particularly Figure 1, come out very clean. - The paper offers something of a counter
I see several red flags in the paper -- both in the framing of the paper and the experimental design. These are severe enough that I wouldn't recommend acceptance at this time. Major -- framing - A casual reader could be led by the framing of the paper to believe that scaling laws hold generally across imitation learning. However, a more critical read leaves one with the impression the authors set out deliberately to uncover environments with relationships that could be classified as scaling la
- The paper is exceptionally well-written, introducing a novel study. - It presents scaling trends on various metrics such as dev loss, returns, and final test returns. - Scaled+BC demonstrates substantial improvement over existing methods in NetHack. This indicates that BC remains a robust baseline when applied with a suitable model and sample size.
While there are no significant weaknesses identified, including discussion points from the questions section (below) could enhance the robustness of the submission.
1. This paper is the first to study the scaling law of imitation learning in the domain of games. This paper confirms that scaling up the model size and data size can result in better agents, which is an important conclusion for IL. 2. This paper conducts systematic experiments to validate the scaling law. The experimental design is sound and the experimental results are convincing.
1. Given existing works studying the scaling law in LLMs, the novelty of investigating the scaling law of BC in games is limited. As claimed in this paper, LLMs use the same MLE objective as BC. Thus, the only difference between this paper and existing works is the agent domain. As a result, it is not very surprising that the single-agent games domain can exhibit a similar scaling law. A more interesting direction is to investigate the scaling law of another class of IL methods named adversarial
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization
MethodsNone · Focus
