Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

Jing Sun

arXiv:2604.13517·cs.LG·May 22, 2026

Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

Jing Sun

PDF

1 Repo 1 Models

TL;DR

This paper identifies issues with multi-timescale reinforcement learning architectures and proposes a Target Decoupling method that improves performance and stability in long-term planning tasks.

Contribution

It introduces a novel Target Decoupling architecture that isolates short-term and long-term signals, preventing surrogate hacking and myopic degeneration in multi-timescale PPO.

Findings

01

Achieves statistically significant improvements in LunarLander-v2.

02

Surpasses the 'Environment Solved' threshold with minimal variance.

03

Eliminates policy collapse and avoids local optima traps.

Abstract

Temporal credit assignment in reinforcement learning has long been a central challenge. Inspired by the multi-timescale encoding of the dopamine system in neurobiology, recent research has sought to introduce multiple discount factors into Actor-Critic architectures, such as Proximal Policy Optimization (PPO), to balance short-term responses with long-term planning. However, this paper reveals that blindly fusing multi-timescale signals in complex delayed-reward tasks can lead to severe algorithmic pathologies. We systematically demonstrate that exposing a temporal attention routing mechanism to policy gradients results in surrogate objective hacking, while adopting gradient-free uncertainty weighting triggers irreversible myopic degeneration, a phenomenon we term the Paradox of Temporal Uncertainty. To address these issues, we propose a Target Decoupling architecture: on the Critic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ben-dlwlrma/Representation-Over-Routing
github

Models

🤗
ben-dlwlrma/Representation-Over-Routing
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.