Order-Level Attention Similarity Across Language Models: A Latent Commonality

Jinglin Liang; Jin Zhong; Shuangping Huang; Yunqing Hu; Huiyuan Zhang; Huifang Li; Lixin Fan; and Hanlin Gu

arXiv:2511.05064·cs.CL·November 10, 2025

Order-Level Attention Similarity Across Language Models: A Latent Commonality

Jinglin Liang, Jin Zhong, Shuangping Huang, Yunqing Hu, Huiyuan Zhang, Huifang Li, Lixin Fan, and Hanlin Gu

PDF

Open Access 1 Video

TL;DR

This paper investigates common patterns in attention mechanisms across different language models, revealing similarities at the order level, and introduces a training-free adapter that leverages these patterns for improved cross-model transfer.

Contribution

It introduces Order-Level Attention (OLA) to analyze attention similarities across LMs and proposes the Transferable OLA Adapter (TOA) for effective cross-LM transfer without additional training.

Findings

01

OLA shows significant similarity across different LMs at the same order.

02

The TOA method improves performance on unseen LMs without parameter updates.

03

Cross-LM transfer is effectively enhanced using OLA-based features.

Abstract

In this paper, we explore an important yet previously neglected question: Do context aggregation patterns across Language Models (LMs) share commonalities? While some works have investigated context aggregation or attention weights in LMs, they typically focus on individual models or attention heads, lacking a systematic analysis across multiple LMs to explore their commonalities. In contrast, we focus on the commonalities among LMs, which can deepen our understanding of LMs and even facilitate cross-model knowledge transfer. In this work, we introduce the Order-Level Attention (OLA) derived from the order-wise decomposition of Attention Rollout and reveal that the OLA at the same order across LMs exhibits significant similarities. Furthermore, we discover an implicit mapping between OLA and syntactic knowledge. Based on these two findings, we propose the Transferable OLA Adapter (TOA),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Order-Level Attention Similarity Across Language Models: A Latent Commonality· slideslive

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning