Transformers perform adaptive partial pooling

Vsevolod Kapatsinski

arXiv:2602.03980·cs.CL·February 5, 2026

Transformers perform adaptive partial pooling

Vsevolod Kapatsinski

PDF

Open Access

TL;DR

This paper demonstrates that GPT-2 transformers exhibit adaptive partial pooling behavior similar to hierarchical regression, with pooling decreasing over training and influenced by context frequency and variability, reflecting realistic learning dynamics.

Contribution

It reveals that transformers perform adaptive partial pooling akin to hierarchical regression, influenced by context frequency and variability, and that pooling decreases with training epochs.

Findings

01

Pooling decreases with training epochs

02

Pooling is influenced by context frequency and variability

03

Transformer behavior aligns with hierarchical regression principles

Abstract

Because language is creative, any reasonable language model must generalize, deciding what to say in novel contexts by using information from similar contexts. But what about contexts that are not novel but merely infrequent? In hierarchical regression, the model's predictions for behavior in a context are affected by observations from other similar contexts to the extent that 1) the current context is infrequent and 2) different contexts behave similarly. This is called adaptive partial pooling of evidence. This paper shows that next-word predictions of a transformer (GPT2) are increasingly unaffected by observations from outside the current context across epochs of training (the amount of pooling reduces with training), and that the extent of pooling is affected by context frequency, context number (type frequency) and context variability in a similar way to hierarchical regression.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Topic Modeling · Language Development and Disorders