Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models

Junjie Wu; Gefei Gu; Yanan Zheng; Dit-Yan Yeung; Arman Cohan

arXiv:2507.09506·cs.CL·August 5, 2025

Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models

Junjie Wu, Gefei Gu, Yanan Zheng, Dit-Yan Yeung, Arman Cohan

PDF

Open Access 1 Repo

TL;DR

Ref-Long introduces a new benchmark to evaluate long-context referencing in language models, revealing significant challenges even for advanced models like GPT-4o, and providing insights through extensive analysis.

Contribution

This paper presents Ref-Long, a novel benchmark for assessing long-context referencing in LCLMs, highlighting existing shortcomings and offering comprehensive analysis.

Findings

01

Significant referencing shortcomings in 13 LCLMs, including GPT-4o.

02

Ref-Long's diverse scenarios reveal model limitations.

03

Analysis suggests need for improved long-context understanding.

Abstract

Long-context language models (LCLMs) have exhibited impressive capabilities in long-context understanding tasks. Among these, long-context referencing -- a crucial task that requires LCLMs to attribute items of interest to specific parts of long-context data -- remains underexplored. To bridge this gap, this paper proposes Referencing Evaluation for Long-context Language Models (Ref-Long), a novel benchmark designed to assess the long-context referencing capability of LCLMs. Specifically, Ref-Long requires LCLMs to identify the indexes of documents that reference a specific key, emphasizing contextual relationships between the key and the documents over simple retrieval. Based on the task design, we construct three subsets ranging from synthetic to realistic scenarios to form the Ref-Long benchmark. Experimental results of 13 LCLMs reveal significant shortcomings in long-context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wujunjie1998/ref-long
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques