Lagrangian Method for Q-Function Learning (with Applications to Machine Translation)
Huang Bojun

TL;DR
This paper introduces a novel Lagrangian framework for learning optimal Q-functions, leveraging duality theory to develop algorithms, with applications demonstrated in machine translation tasks.
Contribution
It formulates Q-function learning as a saddle point problem using a nonlinear Lagrangian, providing a new theoretical foundation and practical algorithms for reinforcement learning.
Findings
Strong duality holds despite nonlinearity.
Developed an imitation learning algorithm based on duality.
Applied the method successfully to machine translation benchmarks.
Abstract
This paper discusses a new approach to the fundamental problem of learning optimal Q-functions. In this approach, optimal Q-functions are formulated as saddle points of a nonlinear Lagrangian function derived from the classic Bellman optimality equation. The paper shows that the Lagrangian enjoys strong duality, in spite of its nonlinearity, which paves the way to a general Lagrangian method to Q-function learning. As a demonstration, the paper develops an imitation learning algorithm based on the duality theory, and applies the algorithm to a state-of-the-art machine translation benchmark. The paper then turns to demonstrate a symmetry breaking phenomenon regarding the optimality of the Lagrangian saddle points, which justifies a largely overlooked direction in developing the Lagrangian method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetaheuristic Optimization Algorithms Research · Neural Networks and Applications
