Language models recognize dropout and Gaussian noise applied to their activations

Damiano Fornasiere; Mirko Bronzi; Spencer Kitts; Alessandro Palmas; Yoshua Bengio; Oliver Richardson

arXiv:2604.17465·cs.AI·May 4, 2026

Language models recognize dropout and Gaussian noise applied to their activations

Damiano Fornasiere, Mirko Bronzi, Spencer Kitts, Alessandro Palmas, Yoshua Bengio, Oliver Richardson

PDF

TL;DR

This paper demonstrates that large language models can detect and distinguish between dropout and Gaussian noise perturbations in their activations, revealing an inherent awareness of such modifications.

Contribution

It shows that models from the Llama, Olmo, and Qwen families can recognize, localize, and verbalize different types of activation perturbations, even in zero-shot settings.

Findings

01

Models can detect and localize perturbations with high accuracy.

02

Qwen3-32B's accuracy improves with perturbation strength.

03

Models can learn to distinguish between dropout and Gaussian noise.

Abstract

We provide evidence that language models can detect, localize and, to a certain degree, verbalize the difference between perturbations applied to their activations. More precisely, we either (a) mask activations, simulating dropout, or (b) add Gaussian noise to them, at a target sentence. We then ask a multiple-choice question such as "Which of the previous sentences was perturbed?" or "Which of the two perturbations was applied?". We test models from the Llama, Olmo, and Qwen families, with sizes between 8B and 32B, all of which can easily detect and localize the perturbations, often with perfect accuracy. These models can also learn, when taught in context, to distinguish between dropout and Gaussian noise. Notably, Qwen3-32B's zero-shot accuracy in identifying which perturbation was applied improves as a function of the perturbation strength and, moreover, decreases if the in-context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.