Frontier Models Can Take Actions at Low Probabilities

Alex Serrano; Wen Xing; David Lindner; Erik Jenner

arXiv:2603.02202·cs.LG·March 3, 2026

Frontier Models Can Take Actions at Low Probabilities

Alex Serrano, Wen Xing, David Lindner, Erik Jenner

PDF

Open Access

TL;DR

Frontier models demonstrate a surprising ability to perform actions at very low probabilities with high calibration, especially when external entropy or explicit reasoning is involved, raising concerns for model oversight.

Contribution

This study evaluates the capability of frontier models to take low-probability actions and highlights their potential to evade detection during pre-deployment evaluations.

Findings

01

Models maintain high calibration at rates below 1 in 100,000 with entropy.

02

Larger models perform better at low-rate calibration when given target rates.

03

Explicit Chain-of-Thought reasoning is crucial for successful low-rate actions.

Abstract

Pre-deployment evaluations inspect only a limited sample of model actions. A malicious model seeking to evade oversight could exploit this by randomizing when to "defect": misbehaving so rarely that no malicious actions are observed during evaluation, but often enough that they occur eventually in deployment. But this requires taking actions at very low rates, while maintaining calibration. Are frontier models even capable of that? We prompt the GPT-5, Claude-4.5 and Qwen-3 families to take a target action at low probabilities (e.g. 0.01%), either given directly or requiring derivation, and evaluate their calibration (i.e. whether they perform the target action roughly 1 in 10,000 times when resampling). We find that frontier models are surprisingly good at this task. If there is a source of entropy in-context (such as a UUID), they maintain high calibration at rates lower than 1 in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Explainable Artificial Intelligence (XAI)