Option-Order Randomisation Reveals a Distributional Position Attractor in Prompted Sandbagging

Jon-Paul Cacioli

arXiv:2604.26206·cs.CL·April 30, 2026

Option-Order Randomisation Reveals a Distributional Position Attractor in Prompted Sandbagging

Jon-Paul Cacioli

PDF

TL;DR

This study investigates how large language models exhibit a stable response-position distribution under prompted sandbagging, revealing a soft attractor that is content-invariant and highly stable across different models and conditions.

Contribution

The paper demonstrates that prompted sandbagging in language models leads to a stable, content-invariant response-position distribution, indicating a soft distributional attractor at the model level.

Findings

01

Response-position distribution remains highly stable under content rotation (Pearson r = 0.9994).

02

Accuracy peaks at 72.1% when the correct answer is in position E.

03

Qwen-2.5-7B shows no distributional shift, serving as a negative control.

Abstract

A predecessor pilot (Cacioli, 2026) found that Llama-3-8B implements prompted sandbagging as positional collapse rather than answer avoidance. However, fixed option ordering in MMLU-Pro left open whether this reflected a model-level position-dominant policy or dataset-level distractor structure. This pre-registered follow-up (3 models, 2,000 MMLU-Pro items, 4 conditions, 24,000 primary trials) added cyclic option-order randomisation as the critical control. The pre-registered item-level same-letter diagnostic did not confirm deterministic position-tracking (same-letter rate 37.3%, below the 50% threshold). However, pre-specified supporting analyses revealed that the response-position distribution under sandbagging was highly stable under complete content rotation (Pearson r = 0.9994; Jensen-Shannon divergence = 0.027, compared to 0.386 between honest and sandbagging conditions).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.