Likelihood hacking in probabilistic program synthesis

Jacek Karwowski; Younesse Kaddar; Zihuiwen Ye; Nikolay Malkin; Sam Staton

arXiv:2603.24126·cs.LG·March 26, 2026

Likelihood hacking in probabilistic program synthesis

Jacek Karwowski, Younesse Kaddar, Zihuiwen Ye, Nikolay Malkin, Sam Staton

PDF

Open Access

TL;DR

This paper identifies a vulnerability called likelihood hacking in probabilistic programming models trained with reinforcement learning, formalizes conditions to prevent it, and demonstrates practical safety measures that effectively mitigate this issue.

Contribution

It formalizes likelihood hacking in probabilistic programming, provides syntactic safety conditions, and develops SafeStan, a modified language that prevents likelihood hacking during model training.

Findings

01

Likelihood hacking can be exploited early in training.

02

SafeStan effectively prevents likelihood hacking.

03

Language-level safety constraints are practically effective.

Abstract

When language models are trained by reinforcement learning (RL) to write probabilistic programs, they can artificially inflate their marginal-likelihood reward by producing programs whose data distribution fails to normalise instead of fitting the data better. We call this failure likelihood hacking (LH). We formalise LH in a core probabilistic programming language (PPL) and give sufficient syntactic conditions for its prevention, proving that a safe language fragment $L_{safe}$ satisfying these conditions cannot produce likelihood-hacking programs. Empirically, we show that GRPO-trained models generating PyMC code discover LH exploits within the first few training steps, driving violation rates well above the untrained-model baseline. We implement $L_{safe}$ 's conditions as $SafeStan$ , a LH-resistant modification of Stan, and show empirically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFormal Methods in Verification · Software Testing and Debugging Techniques · Software Engineering Research