Stackelberg Coupling of Online Representation Learning and Reinforcement Learning

Fernando Martinez; Tao Li; Yingdong Lu; Juntao Chen

arXiv:2508.07452·cs.LG·January 30, 2026

Stackelberg Coupling of Online Representation Learning and Reinforcement Learning

Fernando Martinez, Tao Li, Yingdong Lu, Juntao Chen

PDF

Open Access

TL;DR

This paper introduces SCORER, a hierarchical framework modeling representation and value learning as strategic agents in a game, leading to more stable and efficient deep reinforcement learning.

Contribution

The paper proposes a novel bi-level optimization framework, SCORER, that separates the updates of value and representation networks to improve stability and performance in RL.

Findings

01

SCORER reduces bias and variance in value estimates.

02

Experimental results show improved stability and performance over traditional methods.

03

Gains are attributed to algorithmic design rather than increased model complexity.

Abstract

Deep Q-learning jointly learns representations and values within monolithic networks, promising beneficial co-adaptation between features and value estimates. Although this architecture has attained substantial success, the coupling between representation and value learning creates instability as representations must constantly adapt to non-stationary value targets, while value estimates depend on these shifting representations. This is compounded by high variance in bootstrapped targets, which causes bias in value estimation in off-policy methods. We introduce Stackelberg Coupled Representation and Reinforcement Learning (SCORER), a framework for value-based RL that views representation and Q-learning as two strategic agents in a hierarchical game. SCORER models the Q-function as the leader, which commits to its strategy by updating less frequently, while the perception network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Game Theory and Applications