Aligning Agents like Large Language Models

Adam Jelley; Yuhan Cao; Dave Bignell; Amos Storkey; Sam Devlin; Tabish Rashid

arXiv:2406.04208·cs.LG·December 30, 2025

Aligning Agents like Large Language Models

Adam Jelley, Yuhan Cao, Dave Bignell, Amos Storkey, Sam Devlin, Tabish Rashid

PDF

Open Access 3 Reviews

TL;DR

This paper proposes training decision-making agents in complex environments using methods inspired by Large Language Model training, aiming for more general, robust, and aligned behaviors, demonstrated through a 3D video game proof-of-concept.

Contribution

It introduces a novel approach to training agents by applying LLM training procedures, providing a proof-of-concept and insights for future development of generalizable agents.

Findings

01

LLM training procedures can be adapted for agents in 3D environments

02

Stage-wise analysis of LLM training pipeline improves agent training

03

Initial results show promising generalization capabilities

Abstract

Training agents to act competently in complex 3D environments from high-dimensional visual information is challenging. Reinforcement learning is conventionally used to train such agents, but requires a carefully designed reward function, and is difficult to scale to obtain robust agents that generalize to new tasks. In contrast, Large Language Models (LLMs) demonstrate impressively general capabilities resulting from large-scale pre-training and post-training alignment, but struggle to act in complex environments. This position paper draws explicit analogies between decision-making agents and LLMs, and argues that agents should be trained like LLMs to achieve more general, robust, and aligned behaviors. We provide a proof-of-concept to demonstrate how the procedure for training LLMs can be used to train an agent in a 3D video game environment from pixels. We investigate the importance…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

- The overall motivation of studying how well imitation-learning + RL with preference-based reward models applies aligning generally capable game agents is interesting and meaningful. - The additional experiments showing the efficacy of combining online learning with the learned reward model and additional finetuning on the top 20% of offline trajectories (as ranked by the same reward model) is interesting and a meaningful contribution, as most prior works only focus on either the offline or on

Weaknesses

- The paper shows that the same overall process for finetuning LLMs can also be applied to agents within this game. However, the overall method has limited novelty – as noted in the related works section, the general process of bootstrapping RL with imitation learning and further finetuning agent behaviors with reward models learned from human preferences have both been studied in prior work training RL agents outside of LLMs. - The authors note that unlike prior work, their goal is to train "a

Reviewer 02Rating 3· reject, not good enoughConfidence 3

Strengths

+The topic studied here is important. Albeit the success of the training-tuning-alignment scheme of large language models, its effectiveness in other domains has not been fully verified. This paper has made a valuable contribution to exploring this idea of agent learning in game AI and the results do show some promises. I believe it could drive the interest of audiences from both LLM and game AI communities (and possibly more). +Although the task considered in this paper can be relatively simpl

Weaknesses

Overall, I think the idea that this manuscript tries to put up with is clear and neat, but it can be a bit premature in terms of the width and depth of the investigation. Some substantial augmentation on the experiment part should be done before it can be accepted by a major conference. Here are some suggestions: -Width: the authors have claimed they "investigate how the procedure for aligning LLMs can be applied to aligning agents from pixels in a complex 3D environment". Although I do agree t

Reviewer 03Rating 3· reject, not good enoughConfidence 3

Strengths

The paper is written nicely and easy to follow. The proposed game is interesting and different from the rest of the community.

Weaknesses

No human behaviour data is going to be released. The game environment is not going to be released. This makes the reproducibility of the results impossible. Limited details are provided of the environment, such as the internal mechanism of the game, how it simulates the physics. These critical details are essential given the game is not widely known in the research community. The task is not particularly challenging and focus on a niche mechanics in the game. The task is only focusing on a sma

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsALIGN