Extending MONA in Camera Dropbox: Reproduction, Learned Approval, and Design Implications for Reward-Hacking Mitigation

Nathan Heath

arXiv:2603.29993·cs.AI·April 1, 2026

Extending MONA in Camera Dropbox: Reproduction, Learned Approval, and Design Implications for Reward-Hacking Mitigation

Nathan Heath

PDF

1 Repo

TL;DR

This paper reproduces and extends the MONA environment to evaluate how different approval mechanisms affect reward hacking, highlighting the importance of calibration and foresight in learned approval models.

Contribution

It provides a reproducible Python implementation of MONA, introduces a modular suite of approval mechanisms, and empirically evaluates their impact on reward hacking.

Findings

01

Oracle MONA achieves 0.0% reward hacking.

02

Calibrated learned approval reduces hacking to 11.9%.

03

Under-optimization explains lower behavior rates than oracle MONA.

Abstract

Myopic Optimization with Non-myopic Approval (MONA) mitigates multi-step reward hacking by restricting the agent's planning horizon while supplying far-sighted approval as a training signal~\cite{farquhar2025mona}. The original paper identifies a critical open question: how the method of constructing approval -- particularly the degree to which approval depends on achieved outcomes -- affects whether MONA's safety guarantees hold. We present a reproduction-first extension of the public MONA Camera Dropbox environment that (i)~repackages the released codebase as a standard Python project with scripted PPO training, (ii)~confirms the published contrast between ordinary RL (91.5\% reward-hacking rate) and oracle MONA (0.0\% hacking rate) using the released reference arrays, and (iii)~introduces a modular learned-approval suite spanning oracle, noisy, misspecified, learned, and calibrated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

codernate92/mona-camera-dropbox-repro
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.