Scaling Laws for Imitation Learning in Single-Agent Games

Jens Tuyls; Dhruv Madeka; Kari Torkkola; Dean Foster; Karthik; Narasimhan; Sham Kakade

arXiv:2307.09423·cs.LG·December 20, 2024

Scaling Laws for Imitation Learning in Single-Agent Games

Jens Tuyls, Dhruv Madeka, Kari Torkkola, Dean Foster, Karthik, Narasimhan, Sham Kakade

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper investigates how scaling up model and data size in imitation learning improves performance in single-agent games, demonstrating power-law scaling and outperforming previous methods in challenging environments like NetHack.

Contribution

It provides the first detailed analysis of scaling laws in imitation learning for single-agent games, showing that larger models and datasets lead to significant performance gains.

Findings

01

IL loss and mean return scale smoothly with compute (FLOPs)

02

Power laws describe the relationship between compute and performance

03

Scaled IL agents outperform prior state-of-the-art in NetHack by 1.5x

Abstract

Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, many works find it is often unable to fully recover the underlying expert behavior, even in constrained environments like single-agent games. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where "scaling up" has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games. We first demonstrate our findings on a variety of Atari games, and thereafter focus on the extremely challenging game of NetHack. In all games, we find that IL loss and mean return scale smoothly with the compute budget (FLOPs) and are strongly correlated, resulting in power laws for…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

- The objective of the paper -- studying scaling laws in BC -- is timely and I'm sure would capture the interest of many researchers in and around imitation learning. - The paper presents a suite of experiments, and subsequent analysis, that required substantial effort to set up, and significant compute to run. I see value in sharing this with other researchers. - Presentation is mostly clear. The main results, particularly Figure 1, come out very clean. - The paper offers something of a counter

Weaknesses

I see several red flags in the paper -- both in the framing of the paper and the experimental design. These are severe enough that I wouldn't recommend acceptance at this time. Major -- framing - A casual reader could be led by the framing of the paper to believe that scaling laws hold generally across imitation learning. However, a more critical read leaves one with the impression the authors set out deliberately to uncover environments with relationships that could be classified as scaling la

Reviewer 02Rating 8· accept, good paperConfidence 3

Strengths

- The paper is exceptionally well-written, introducing a novel study. - It presents scaling trends on various metrics such as dev loss, returns, and final test returns. - Scaled+BC demonstrates substantial improvement over existing methods in NetHack. This indicates that BC remains a robust baseline when applied with a suitable model and sample size.

Weaknesses

While there are no significant weaknesses identified, including discussion points from the questions section (below) could enhance the robustness of the submission.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1. This paper is the first to study the scaling law of imitation learning in the domain of games. This paper confirms that scaling up the model size and data size can result in better agents, which is an important conclusion for IL. 2. This paper conducts systematic experiments to validate the scaling law. The experimental design is sound and the experimental results are convincing.

Weaknesses

1. Given existing works studying the scaling law in LLMs, the novelty of investigating the scaling law of BC in games is limited. As claimed in this paper, LLMs use the same MLE objective as BC. Thus, the only difference between this paper and existing works is the agent domain. As a result, it is not very surprising that the single-agent games domain can exhibit a similar scaling law. A more interesting direction is to investigate the scaling law of another class of IL methods named adversarial

Code & Models

Repositories

princeton-nlp/il-scaling-in-games
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization

MethodsNone · Focus