Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL
Tong Yang, Bo Dai, Lin Xiao, Yuejie Chi

TL;DR
This paper introduces a primal-dual perspective on exploration in online reinforcement learning, proposing a value-incentivized actor-critic method with theoretical guarantees for sample efficiency.
Contribution
It presents a novel VAC algorithm based on primal-dual optimization that unifies exploration and exploitation with theoretical performance bounds.
Findings
Achieves near-optimal regret in linear MDPs.
Provides a unified framework for exploration via primal-dual interpretation.
Extensible to general function approximation under certain conditions.
Abstract
Online reinforcement learning (RL) with complex function approximations such as transformers and deep neural networks plays a significant role in the modern practice of artificial intelligence. Despite its popularity and importance, balancing the fundamental trade-off between exploration and exploitation remains a long-standing challenge; in particular, we are still in lack of efficient and practical schemes that are backed by theoretical performance guarantees. Motivated by recent developments in exploration via optimistic regularization, this paper provides an interpretation of the principle of optimism through the lens of primal-dual optimization. From this fresh perspective, we set forth a new value-incentivized actor-critic (VAC) method, which optimizes a single easy-to-optimize objective integrating exploration and exploitation -- it promotes state-action and policy estimates that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adaptive Dynamic Programming Control
