Beyond Position: the emergence of wavelet-like properties in Transformers

Valeria Ruscio; Umberto Nanni; Fabrizio Silvestri

arXiv:2410.18067·cs.LG·June 6, 2025

Beyond Position: the emergence of wavelet-like properties in Transformers

Valeria Ruscio, Umberto Nanni, Fabrizio Silvestri

PDF

Open Access 1 Video

TL;DR

This paper reveals that Transformer models with Rotary Position Embeddings spontaneously develop wavelet-like, multi-resolution processing capabilities during training, which help overcome positional encoding limitations and enhance model effectiveness.

Contribution

It uncovers the emergent wavelet-like properties in Transformers with RoPE, highlighting their unique multi-resolution processing and evolutionary development during training.

Findings

01

Attention heads evolve to implement multi-resolution processing.

02

Wavelet-like properties are unique to RoPE-based Transformers.

03

Emergence of these properties follows distinct training phases.

Abstract

This paper studies how Transformer models with Rotary Position Embeddings (RoPE) develop emergent, wavelet-like properties that compensate for the positional encoding's theoretical limitations. Through an analysis spanning model scales, architectures, and training checkpoints, we show that attention heads evolve to implement multi-resolution processing analogous to wavelet transforms. We demonstrate that this scale-invariant behavior is unique to RoPE, emerges through distinct evolutionary phases during training, and statistically adheres to the fundamental uncertainty principle. Our findings suggest that the effectiveness of modern Transformers stems from their remarkable ability to spontaneously develop optimal, multi-resolution decompositions to address inherent architectural constraints.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Beyond Position: the emergence of wavelet-like properties in Transformers· underline

Taxonomy

TopicsCephalopods and Marine Biology · Modular Robots and Swarm Intelligence · Optical measurement and interference techniques

MethodsAttention Is All You Need · ALIGN · Linear Layer · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Multi-Head Attention · Adam