LooGLE: Can Long-Context Language Models Understand Long Contexts?
Jiaqi Li, Mengmeng Wang, Zilong Zheng, Muhan Zhang

TL;DR
LooGLE is a new benchmark designed to evaluate large language models' ability to understand and process long contexts, revealing current limitations and guiding future improvements in long-dependency tasks.
Contribution
This paper introduces LooGLE, a comprehensive long-context evaluation benchmark with high-quality, recent documents and questions, addressing gaps in existing datasets and providing a systematic assessment of LLMs' long dependency understanding.
Findings
Commercial models outperform open-source models.
Models excel at short dependency tasks but struggle with long dependency tasks.
Retrieval techniques improve short question-answering; extending context windows has limited effect.
Abstract
Large language models (LLMs), despite their impressive performance in various language tasks, are typically limited to processing texts within context-window size. This limitation has spurred significant research efforts to enhance LLMs' long-context understanding with high-quality long-sequence benchmarks. However, prior datasets in this regard suffer from shortcomings, such as short context length compared to the context window of modern LLMs; outdated documents that have data leakage problems; and an emphasis on short dependency tasks rather than long dependency tasks. In this paper, we present LooGLE, a Long Context Generic Language Evaluation benchmark for LLMs' long context understanding. LooGLE features relatively new documents post-2022, with over 24,000 tokens per document and 6,000 newly generated questions spanning diverse domains. Human annotators meticulously crafted more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
