Quantisation Reshapes the Metacognitive Geometry of Language Models
Jon-Paul Cacioli

TL;DR
Quantisation alters the domain-level metacognitive efficiency profiles of language models without uniformly degrading them, affecting confidence calibration and monitoring across formats.
Contribution
This study reveals that model quantisation reshapes metacognitive profiles in LLMs, challenging assumptions about uniform degradation and highlighting format-dependent effects.
Findings
M-ratio profiles are uncorrelated across quantisation formats.
Confidence-amplification training improves confidence distributions but not meta-d' transfer.
AUROC_2 profiles remain perfectly stable across formats.
Abstract
We report that model quantisation restructures domain-level metacognitive efficiency in LLMs rather than degrading it uniformly. Evaluating Llama-3-8B-Instruct on the same 3,000 questions at Q5_K_M and f16 precision, we find that M-ratio profiles across four knowledge domains are uncorrelated between formats (Spearman rho = 0.00). Arts & Literature moves from worst-monitored (M-ratio = 0.606 at Q5_K_M) to best-monitored (1.542 at f16). Geography moves from well-monitored (1.210) to under-monitored (0.798). However, Type-2 AUROC profiles are perfectly stable across formats (rho = 1.00), localising the restructuring to the M-ratio normalisation rather than the underlying discrimination signal. This finding emerged from a pre-registered attempt to improve metacognition through domain-conditional training. We prescribed confidence-amplification SFT for the diagnosed weak domain, with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
