Quantile Filtered Imitation Learning
David Brandfonbrener, William F. Whitney, Rajesh Ranganath, Joan Bruna

TL;DR
Quantile Filtered Imitation Learning (QFIL) is a new offline RL method that filters data based on Q-value quantiles to improve policy safety and performance, balancing bias and variance.
Contribution
QFIL introduces a novel filtering technique based on Q-value quantiles for safe policy improvement in offline RL, with theoretical guarantees and empirical validation.
Findings
QFIL effectively balances bias and variance in policy improvement.
QFIL achieves strong performance on the D4RL benchmark.
The method provides a hyperparameter for tuning safety and performance tradeoffs.
Abstract
We introduce quantile filtered imitation learning (QFIL), a novel policy improvement operator designed for offline reinforcement learning. QFIL performs policy improvement by running imitation learning on a filtered version of the offline dataset. The filtering process removes pairs whose estimated Q values fall below a given quantile of the pushforward distribution over values induced by sampling actions from the behavior policy. The definitions of both the pushforward Q distribution and resulting value function quantile are key contributions of our method. We prove that QFIL gives us a safe policy improvement step with function approximation and that the choice of quantile provides a natural hyperparameter to trade off bias and variance of the improvement step. Empirically, we perform a synthetic experiment illustrating how QFIL effectively makes a bias-variance tradeoff and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Model Reduction and Neural Networks
