Dual Approximation Policy Optimization
Zhihan Xiong, Maryam Fazel, Lin Xiao

TL;DR
The paper introduces DAPO, a new policy optimization framework that uses dual Bregman divergence for better convergence and includes existing methods as special cases.
Contribution
It presents a novel dual approximation framework for policy optimization that guarantees fast convergence and unifies several existing methods.
Findings
Achieves fast linear convergence with general function approximation.
Includes several practical methods as special cases.
Provides strong theoretical convergence guarantees.
Abstract
We propose Dual Approximation Policy Optimization (DAPO), a framework that incorporates general function approximation into policy mirror descent methods. In contrast to the popular approach of using the -norm to measure function approximation errors, DAPO uses the dual Bregman divergence induced by the mirror map for policy projection. This duality framework has both theoretical and practical implications: not only does it achieve fast linear convergence with general function approximation, but it also includes several well-known practical methods as special cases, immediately providing strong convergence guarantees.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems
MethodsDialogue-Adaptive Pre-training Objective
