Correcting Suppressed Log-Probabilities in Language Models with Post-Transformer Adapters

Bryan Sanchez

arXiv:2604.14174·cs.CL·April 20, 2026

Correcting Suppressed Log-Probabilities in Language Models with Post-Transformer Adapters

Bryan Sanchez

PDF

TL;DR

This paper introduces a post-transformer adapter that corrects suppressed factual log-probabilities in language models, improving factual accuracy on politically sensitive topics with minimal parameter addition.

Contribution

It demonstrates that a small adapter trained on frozen states can effectively correct factual suppression across multiple model scales and generalize to unseen facts.

Findings

01

The adapter memorizes training facts and generalizes to held-out facts.

02

Applying the adapter only at the last token position yields coherent, less censored text.

03

A silent gradient bug in Apple MLX caused null results in earlier experiments.

Abstract

Alignment-tuned language models frequently suppress factual log-probabilities on politically sensitive topics despite retaining the knowledge in their hidden representations. We show that a 786K-parameter (approximately 0.02% of the base model) post-transformer adapter, trained on frozen hidden states, corrects this suppression on 31 ideology-discriminating facts across Qwen3-4B, 8B, and 14B. The adapter memorizes all 15 training facts and generalizes to 11--39% of 16 held-out facts across 5 random splits per scale, with zero knowledge regressions via anchored training. Both gated (SwiGLU) and ungated (linear bottleneck) adapters achieve comparable results; neither consistently outperforms the other (Fisher exact p > 0.09 at all scales). On instruct models, the adapter corrects log-probability rankings. When applied at all token positions during generation, the adapter produces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.