Meta-Value Learning: a General Framework for Learning with Learning   Awareness

Tim Cooijmans; Milad Aghajohari; Aaron Courville

arXiv:2307.08863·cs.LG·December 12, 2023·1 cites

Meta-Value Learning: a General Framework for Learning with Learning Awareness

Tim Cooijmans, Milad Aghajohari, Aaron Courville

PDF

Open Access 1 Repo

TL;DR

This paper introduces Meta-Value Learning (MeVa), a framework that evaluates joint policies based on their long-term optimization prospects, improving multi-agent gradient-based learning by considering future interactions.

Contribution

It proposes a novel meta-value approach that assesses policies over long-term horizons, avoiding explicit action space representation and REINFORCE estimators, enhancing multi-agent learning stability.

Findings

01

MeVa is consistent and far-sighted.

02

It outperforms prior methods in toy game scenarios.

03

The approach effectively captures long-term optimization prospects.

Abstract

Gradient-based learning in multi-agent systems is difficult because the gradient derives from a first-order model which does not account for the interaction between agents' learning processes. LOLA (arXiv:1709.04326) accounts for this by differentiating through one step of optimization. We propose to judge joint policies by their long-term prospects as measured by the meta-value, a discounted sum over the returns of future optimization iterates. We apply a form of Q-learning to the meta-game of optimization, in a way that avoids the need to explicitly represent the continuous action space of policy updates. The resulting method, MeVa, is consistent and far-sighted, and does not require REINFORCE estimators. We analyze the behavior of our method on a toy game and compare to prior work on repeated matrix games.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

metavaluelearning/metavaluelearning
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research

MethodsQ-Learning · REINFORCE