Classifier Context Rot: Monitor Performance Degrades with Context Length

Sam Martin; Fabien Roger

arXiv:2605.12366·cs.AI·May 13, 2026

Classifier Context Rot: Monitor Performance Degrades with Context Length

Sam Martin, Fabien Roger

PDF

TL;DR

This paper reveals that current language models struggle to detect dangerous actions in long transcripts, especially after 800K tokens, highlighting a significant challenge in monitoring coding agents over extended interactions.

Contribution

The study demonstrates the degradation of classifier performance with increasing transcript length and suggests mitigation strategies like prompting techniques and improved training.

Findings

01

Models miss dangerous actions 2x to 30x more often after 800K tokens.

02

Prompting techniques can partially mitigate performance degradation.

03

Long-context degradation leads to overestimation of monitor effectiveness.

Abstract

Monitoring coding agents for dangerous behavior using language models requires classifying transcripts that often exceed 500K tokens, but prior agent monitoring benchmarks rarely contain transcripts longer than 100K tokens. We show that when used as classifiers, current frontier models fail to notice dangerous actions more often in longer transcripts. In particular, on a dataset that requires identifying when a coding agent takes a subtly dangerous action, Opus 4.6, GPT 5.4, and Gemini 3.1 miss these actions $2 \times$ to $30 \times$ more often when they occur after 800K tokens of benign activity than when they occur on their own. We also show that these weaknesses can be partially mitigated with prompting techniques such as periodic reminders throughout the transcript and may be mitigated further with better post-training. Monitor evaluations that do not consider long-context degradation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.