Lagrangian Method for Q-Function Learning (with Applications to Machine   Translation)

Huang Bojun

arXiv:2207.11161·cs.LG·August 30, 2022

Lagrangian Method for Q-Function Learning (with Applications to Machine Translation)

Huang Bojun

PDF

Open Access

TL;DR

This paper introduces a novel Lagrangian framework for learning optimal Q-functions, leveraging duality theory to develop algorithms, with applications demonstrated in machine translation tasks.

Contribution

It formulates Q-function learning as a saddle point problem using a nonlinear Lagrangian, providing a new theoretical foundation and practical algorithms for reinforcement learning.

Findings

01

Strong duality holds despite nonlinearity.

02

Developed an imitation learning algorithm based on duality.

03

Applied the method successfully to machine translation benchmarks.

Abstract

This paper discusses a new approach to the fundamental problem of learning optimal Q-functions. In this approach, optimal Q-functions are formulated as saddle points of a nonlinear Lagrangian function derived from the classic Bellman optimality equation. The paper shows that the Lagrangian enjoys strong duality, in spite of its nonlinearity, which paves the way to a general Lagrangian method to Q-function learning. As a demonstration, the paper develops an imitation learning algorithm based on the duality theory, and applies the algorithm to a state-of-the-art machine translation benchmark. The paper then turns to demonstrate a symmetry breaking phenomenon regarding the optimality of the Lagrangian saddle points, which justifies a largely overlooked direction in developing the Lagrangian method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMetaheuristic Optimization Algorithms Research · Neural Networks and Applications