Reinforcing the World's Edge: A Continual Learning Problem in the Multi-Agent-World Boundary

Dane Malenfant

arXiv:2603.06813·cs.AI·March 10, 2026

Reinforcing the World's Edge: A Continual Learning Problem in the Multi-Agent-World Boundary

Dane Malenfant

PDF

Open Access

TL;DR

This paper investigates how the agent-world boundary affects decision structure transfer in reinforcement learning, especially in multi-agent settings, highlighting the challenges of boundary drift and its impact on invariant cores.

Contribution

It introduces a formal analysis of the agent-world boundary in decentralized MARL and demonstrates how boundary drift causes the loss of invariant decision cores across episodes.

Findings

01

Invariant cores can be constructed in stationary MDPs

02

Boundary drift causes the invariant core to shrink or vanish

03

Policy updates induce non-stationarity affecting decision invariants

Abstract

Reusable decision structure survives across episodes in reinforcement learning, but this depends on how the agent--world boundary is drawn. In stationary, finite-horizon MDPs, an invariant core: the (not-necessarily contiguous) subsequences of state--action pairs shared by all successful trajectories (optionally under a simple abstraction) can be constructed. Under mild goal-conditioned assumptions, it's existence can be proven and explained by how the core captures prototypes that transfer across episodes. When the same task is embedded in a decentralized Markov game and the peer agent is folded into the world, each peer-policy update induces a new MDP; the per-episode invariant core can shrink or vanish, even with small changes to the induced world dynamics, sometimes leaving only the individual task core or just nothing. This policy-induced non-stationarity can be quantified with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Advanced Bandit Algorithms Research