Where Does Long-Context Supervision Actually Go? Effective-Context Exposure Balancing

Jinchang Zhu; Jindong Li; Chengyu Zou; Rong Fu; Chao Wang; Haowei He; and Menglin Yang

arXiv:2605.10544·cs.CL·May 12, 2026

Where Does Long-Context Supervision Actually Go? Effective-Context Exposure Balancing

Jinchang Zhu, Jindong Li, Chengyu Zou, Rong Fu, Chao Wang, Haowei He, and Menglin Yang

PDF

TL;DR

This paper introduces EXACT, a supervision-allocation method that enhances long-context adaptation in language models by emphasizing tokens with longer effective contexts, leading to significant performance improvements.

Contribution

The paper proposes EXACT, a novel supervision-allocation objective that improves long-context adaptation by balancing token-level supervision in packed training.

Findings

01

EXACT improves performance on multiple benchmarks across different models.

02

Long-distance evidence cases benefit most from the proposed method.

03

Standard QA and reasoning tasks remain unaffected by the new supervision strategy.

Abstract

Long-context adaptation is often viewed as window scaling, but this misses a token-level supervision mismatch: in packed training with document masking, each target token's effective context remains short. We introduce EXACT, a supervision-allocation objective that assigns extra weight to long effective-context targets by inverse frequency within the long tail. Across seven Qwen/LLaMA CPT configurations, EXACT improves all 28 trained/extrapolated NoLiMa and RULER comparisons. On Qwen2.5-0.5B, NoLiMa improves by +10.09 (trained) and +5.34 (extrapolated); RULER by +10.69 and +5.55. On LLaMA-3.2-3B, RULER improves by +17.91 and +16.11. Standard QA/reasoning are preserved (+0.24 macro change across six benchmarks). A distance-resolved probe shows gains arise when evidence is thousands of tokens away, while short cases remain unchanged. Results support a supervision-centric thesis:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.