A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise   Datasets

Jake Grigsby; Yanjun Qi

arXiv:2110.04698·cs.LG·December 12, 2023·1 cites

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

Jake Grigsby, Yanjun Qi

PDF

Open Access 3 Repos

TL;DR

This paper evaluates advantage-filtered behavioral cloning in high-noise datasets, demonstrating that with prioritized sampling, agents can learn state-of-the-art policies even when expert data is vastly outnumbered by sub-optimal samples.

Contribution

It introduces a method combining advantage filtering with prioritized experience sampling to effectively learn from datasets dominated by noise.

Findings

01

Agents achieved state-of-the-art performance in benchmark tasks.

02

Prioritized sampling successfully identified expert demonstrations.

03

Effective learning from datasets with 65:1 noise ratio.

Abstract

Recent Offline Reinforcement Learning methods have succeeded in learning high-performance policies from fixed datasets of experience. A particularly effective approach learns to first identify and then mimic optimal decision-making strategies. Our work evaluates this method's ability to scale to vast datasets consisting almost entirely of sub-optimal noise. A thorough investigation on a custom benchmark helps identify several key challenges involved in learning from high-noise datasets. We re-purpose prioritized experience sampling to locate expert-level demonstrations among millions of low-performance samples. This modification enables offline agents to learn state-of-the-art policies in benchmark tasks using datasets where expert actions are outnumbered nearly 65:1.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Neural dynamics and brain function