Mitigating Posterior Salience Attenuation in Long-Context LLMs with Positional Contrastive Decoding

Zikai Xiao; Ziyang Wang; Wen Ma; Yan Zhang; Wei Shen; Yan Wang; Luqi Gong; and Zuozhu Liu

arXiv:2506.08371·cs.CL·June 12, 2025

Mitigating Posterior Salience Attenuation in Long-Context LLMs with Positional Contrastive Decoding

Zikai Xiao, Ziyang Wang, Wen Ma, Yan Zhang, Wei Shen, Yan Wang, Luqi Gong, and Zuozhu Liu

PDF

Open Access 1 Video

TL;DR

This paper introduces a training-free decoding method called Positional Contrastive Decoding (PCD) that mitigates performance degradation in long-context large language models by addressing Posterior Salience Attenuation, achieving state-of-the-art results.

Contribution

The paper proposes a novel, training-free decoding approach that counteracts salience attenuation in long-context LLMs, improving performance without additional training costs.

Findings

01

PCD effectively alleviates attention score degradation over long contexts.

02

Experimental results show PCD achieves state-of-the-art performance on long-context benchmarks.

03

Analysis confirms PCD enhances focus on relevant tokens in long sequences.

Abstract

While Large Language Models (LLMs) support long contexts, they struggle with performance degradation within the context window. Current solutions incur prohibitive training costs, leaving statistical behaviors and cost-effective approaches underexplored. From the decoding perspective, we identify the Posterior Salience Attenuation (PSA) phenomenon, where the salience ratio correlates with long-text performance degradation. Notably, despite the attenuation, gold tokens still occupy high-ranking positions in the decoding space. Motivated by it, we propose the training-free Positional Contrastive Decoding (PCD) that contrasts the logits derived from long-aware attention with those from designed local-aware attention, enabling the model to focus on the gains introduced by large-scale short-to-long training. Through the analysis of long-term decay simulation, we demonstrate that PCD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Mitigating Posterior Salience Attenuation in Long-Context LLMs with Positional Contrastive Decoding· underline

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Multimodal Machine Learning Applications

MethodsSoftmax · Attention Is All You Need · Focus