A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets
Jake Grigsby, Yanjun Qi

TL;DR
This paper evaluates advantage-filtered behavioral cloning in high-noise datasets, demonstrating that with prioritized sampling, agents can learn state-of-the-art policies even when expert data is vastly outnumbered by sub-optimal samples.
Contribution
It introduces a method combining advantage filtering with prioritized experience sampling to effectively learn from datasets dominated by noise.
Findings
Agents achieved state-of-the-art performance in benchmark tasks.
Prioritized sampling successfully identified expert demonstrations.
Effective learning from datasets with 65:1 noise ratio.
Abstract
Recent Offline Reinforcement Learning methods have succeeded in learning high-performance policies from fixed datasets of experience. A particularly effective approach learns to first identify and then mimic optimal decision-making strategies. Our work evaluates this method's ability to scale to vast datasets consisting almost entirely of sub-optimal noise. A thorough investigation on a custom benchmark helps identify several key challenges involved in learning from high-noise datasets. We re-purpose prioritized experience sampling to locate expert-level demonstrations among millions of low-performance samples. This modification enables offline agents to learn state-of-the-art policies in benchmark tasks using datasets where expert actions are outnumbered nearly 65:1.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Neural dynamics and brain function
