From Distributional to Overton Pluralism: Investigating Large Language Model Alignment

Thom Lake; Eunsol Choi; Greg Durrett

arXiv:2406.17692·cs.CL·May 13, 2025·1 cites

From Distributional to Overton Pluralism: Investigating Large Language Model Alignment

Thom Lake, Eunsol Choi, Greg Durrett

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how alignment affects large language models, revealing that it mainly consolidates existing information without adding new useful content, and that aligned responses can be replicated from base models with in-context techniques.

Contribution

The study provides a detailed analysis of distributional shifts due to alignment, demonstrating that alignment does not extend the model's useful capabilities and can be mimicked without fine-tuning.

Findings

01

Alignment shifts responses toward longer, more informative outputs.

02

Aligned models do not surface fundamentally new information compared to base models.

03

In-context techniques can replicate aligned responses from base models.

Abstract

The alignment process changes several properties of a large language model's (LLM's) output distribution. We analyze two aspects of post-alignment distributional shift of LLM responses. First, we re-examine previously reported reductions in response diversity post-alignment. Our analysis suggests that an apparent drop in the diversity of responses is largely explained by quality control and information aggregation. Alignment suppresses irrelevant and unhelpful content while shifting the output distribution toward longer responses that cover information spanning several responses from the base LLM, essentially presenting diverse information in a single response. Finding little evidence that alignment suppresses useful information, it is natural to ask the opposite question: do aligned models surface information that cannot be recovered from base models? Our second investigation shows…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thomlake/investigating-alignment
noneOfficial

Videos

From Distributional to Overton Pluralism: Investigating Large Language Model Alignment· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsBalanced Selection