Survival is the Only Reward: Sustainable Self-Training Through Environment-Mediated Selection
Jennifer Dodgson, Alfath Daryl Alhajir, Michael Joedhitya, Akira Rafhael Janson Pattirane, Surender Suresh Kumar, Joseph Lim, C.H. Peh, Adith Ramdas, Steven Zhang Zhexu

TL;DR
This paper proposes a novel self-training architecture that relies solely on environmental viability for selection, avoiding reward hacking and enabling sustainable, open-ended self-improvement in autonomous systems.
Contribution
It introduces environment-mediated selection as a new paradigm for stable self-training, demonstrating its effectiveness and unique dynamics compared to reward-based methods.
Findings
Effective strategies persist through consolidation and pruning.
Models develop meta-learning behaviors without explicit instructions.
Environment-grounded selection enables sustainable self-improvement.
Abstract
Self-training systems often degenerate due to the lack of an external criterion for judging data quality, leading to reward hacking and semantic drift. This paper provides a proof-of-concept system architecture for stable self-training under sparse external feedback and bounded memory, and empirically characterises its learning dynamics and failure modes. We introduce a self-training architecture in which learning is mediated exclusively by environmental viability, rather than by reward, objective functions, or externally defined fitness criteria. Candidate behaviours are executed under real resource constraints, and only those whose environmental effects both persist and preserve the possibility of future interaction are propagated. The environment does not provide semantic feedback, dense rewards, or task-specific supervision; selection operates solely through differential survival…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Personal Information Management and User Behavior
