Unified Algorithms for RL with Decision-Estimation Coefficients: PAC, Reward-Free, Preference-Based Learning, and Beyond
Fan Chen, Song Mei, Yu Bai

TL;DR
This paper introduces a unified algorithmic framework based on the Decision-Estimation Coefficient for efficiently addressing various reinforcement learning goals, including exploration, model estimation, and preference learning.
Contribution
It develops a generalized DEC framework that unifies multiple RL learning goals and provides a basis for new sample-efficient algorithms and lower bounds.
Findings
Unified framework covers multiple RL goals
New sample-efficient results for diverse learning tasks
Re-analysis of existing algorithms with DEC bounds
Abstract
Modern Reinforcement Learning (RL) is more than just learning the optimal policy; Alternative learning goals such as exploring the environment, estimating the underlying model, and learning from preference feedback are all of practical importance. While provably sample-efficient algorithms for each specific goal have been proposed, these algorithms often depend strongly on the particular learning goal and thus admit different structures correspondingly. It is an urging open question whether these learning goals can rather be tackled by a single unified algorithm. We make progress on this question by developing a unified algorithm framework for a large class of learning goals, building on the Decision-Estimation Coefficient (DEC) framework. Our framework handles many learning goals such as no-regret RL, PAC RL, reward-free learning, model estimation, and preference-based learning, all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Supply Chain and Inventory Management · Reinforcement Learning in Robotics
