Quantile Filtered Imitation Learning

David Brandfonbrener; William F. Whitney; Rajesh Ranganath; Joan Bruna

arXiv:2112.00950·cs.LG·December 3, 2021

Quantile Filtered Imitation Learning

David Brandfonbrener, William F. Whitney, Rajesh Ranganath, Joan Bruna

PDF

Open Access

TL;DR

Quantile Filtered Imitation Learning (QFIL) is a new offline RL method that filters data based on Q-value quantiles to improve policy safety and performance, balancing bias and variance.

Contribution

QFIL introduces a novel filtering technique based on Q-value quantiles for safe policy improvement in offline RL, with theoretical guarantees and empirical validation.

Findings

01

QFIL effectively balances bias and variance in policy improvement.

02

QFIL achieves strong performance on the D4RL benchmark.

03

The method provides a hyperparameter for tuning safety and performance tradeoffs.

Abstract

We introduce quantile filtered imitation learning (QFIL), a novel policy improvement operator designed for offline reinforcement learning. QFIL performs policy improvement by running imitation learning on a filtered version of the offline dataset. The filtering process removes $s, a$ pairs whose estimated Q values fall below a given quantile of the pushforward distribution over values induced by sampling actions from the behavior policy. The definitions of both the pushforward Q distribution and resulting value function quantile are key contributions of our method. We prove that QFIL gives us a safe policy improvement step with function approximation and that the choice of quantile provides a natural hyperparameter to trade off bias and variance of the improvement step. Empirically, we perform a synthetic experiment illustrating how QFIL effectively makes a bias-variance tradeoff and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Model Reduction and Neural Networks