Retrospective on the 2021 BASALT Competition on Learning from Human   Feedback

Rohin Shah; Steven H. Wang; Cody Wild; Stephanie Milani; Anssi; Kanervisto; Vinicius G. Goecks; Nicholas Waytowich; David Watkins-Valls,; Bharat Prakash; Edmund Mills; Divyansh Garg; Alexander Fries; Alexandra; Souly; Chan Jun Shern; Daniel del Castillo; Tom Lieberum

arXiv:2204.07123·cs.AI·April 15, 2022·1 cites

Retrospective on the 2021 BASALT Competition on Learning from Human Feedback

Rohin Shah, Steven H. Wang, Cody Wild, Stephanie Milani, Anssi, Kanervisto, Vinicius G. Goecks, Nicholas Waytowich, David Watkins-Valls,, Bharat Prakash, Edmund Mills, Divyansh Garg, Alexander Fries, Alexandra, Souly, Chan Jun Shern, Daniel del Castillo, Tom Lieberum

PDF

Open Access

TL;DR

This paper reviews the 2021 MineRL BASALT competition at NeurIPS 2021, which aimed to advance learning from human feedback techniques for open-world tasks in Minecraft, highlighting diverse approaches and their validation.

Contribution

It provides a retrospective analysis of the first MineRL BASALT competition, showcasing diverse algorithms, validation of task selection, and insights for future improvements.

Findings

01

Diverse LfHF algorithms achieved similar performance.

02

Different approaches excelled on different tasks.

03

The competition validated the task design and approach diversity.

Abstract

We held the first-ever MineRL Benchmark for Agents that Solve Almost-Lifelike Tasks (MineRL BASALT) Competition at the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021). The goal of the competition was to promote research towards agents that use learning from human feedback (LfHF) techniques to solve open-world tasks. Rather than mandating the use of LfHF techniques, we described four tasks in natural language to be accomplished in the video game Minecraft, and allowed participants to use any approach they wanted to build agents that could accomplish the tasks. Teams developed a diverse range of LfHF algorithms across a variety of possible human feedback types. The three winning teams implemented significantly different approaches while achieving similar performance. Interestingly, their approaches performed well on different tasks, validating our choice of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Data Stream Mining Techniques