Decoupling Positional and Symbolic Attention Behavior in Transformers

Felipe Urrutia; Jorge Salas; Alexander Kozachinskiy; Cristian Buc Calderon; Hector Pasten; Cristobal Rojas

arXiv:2511.11579·cs.LG·November 18, 2025

Decoupling Positional and Symbolic Attention Behavior in Transformers

Felipe Urrutia, Jorge Salas, Alexander Kozachinskiy, Cristian Buc Calderon, Hector Pasten, Cristobal Rojas

PDF

Open Access

TL;DR

This paper investigates how Transformers encode positional and symbolic information, providing a theoretical framework and empirical analysis of attention head behaviors, and demonstrating control over model performance through frequency manipulation.

Contribution

It introduces a formal distinction between positional and symbolic attention behaviors, develops a metric for them, and shows how frequency control influences Transformer performance.

Findings

01

All attention heads show a strong link between behavior and frequency use.

02

Transformer performance can be controlled by restricting frequency access.

03

Theoretical proof that positional and symbolic behaviors are mutually exclusive.

Abstract

An important aspect subtending language understanding and production is the ability to independently encode positional and symbolic information of the words within a sentence. In Transformers, positional information is typically encoded using Positional Encodings (PEs). One such popular PE, namely Rotary PE (RoPE), has been widely used due to its empirical success. Recently, it has been argued that part of RoPE's success emerges from its ability to encode robust positional and semantic information using large and small frequencies, respectively. In this work, we perform a deeper dive into the positional versus symbolic dichotomy of attention heads behavior, both at the theoretical and empirical level. We provide general definitions of what it means for a head to behave positionally or symbolically, prove that these are two mutually exclusive behaviors and develop a metric to quantify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeurobiology of Language and Bilingualism · Action Observation and Synchronization · Topic Modeling