LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion
Zhan Ling, Kang Liu, Kai Yan, Yifan Yang, Weijian Lin, Ting-Han Fan, Lingfeng Shen, Zhengyin Du, Jiecao Chen

TL;DR
LongReason is a synthetic benchmark designed to evaluate the long-context reasoning abilities of large language models across diverse tasks, revealing current models' limitations as context length increases.
Contribution
We introduce LongReason, a comprehensive synthetic benchmark for assessing long-context reasoning in LLMs, covering multiple reasoning types and providing a new standard for evaluation.
Findings
Most LLMs' performance drops with longer context
State-of-the-art models still have significant room for improvement
LongReason is publicly available for research use
Abstract
Large language models (LLMs) have demonstrated remarkable progress in understanding long-context inputs. However, benchmarks for evaluating the long-context reasoning abilities of LLMs fall behind the pace. Existing benchmarks often focus on a narrow range of tasks or those that do not demand complex reasoning. To address this gap and enable a more comprehensive evaluation of the long-context reasoning capabilities of current LLMs, we propose a new synthetic benchmark, LongReason, which is constructed by synthesizing long-context reasoning questions from a varied set of short-context reasoning questions through context expansion. LongReason consists of 794 multiple-choice reasoning questions with diverse reasoning patterns across three task categories: reading comprehension, logical inference, and mathematical word problems. We evaluate 21 LLMs on LongReason, revealing that most models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Recommender Systems and Techniques
MethodsSparse Evolutionary Training · Focus
