GPT-5 vs Other LLMs in Long Short-Context Performance
Nima Esmi (1, 2), Maryam Nezhad-Moghaddam (3), Fatemeh Borhani (3), Asadollah Shahbahrami (2, 3), Amin Daemdoost (3), Georgi Gaydadjiev (4) ((1) Bernoulli Institute, RUG, Groningen, Netherlands, (2) ISRC, Khazar University, Baku, Azerbaijan

TL;DR
This paper evaluates the long-context processing capabilities of GPT-5 and other top LLMs, revealing significant performance drops with large input volumes but high precision in GPT-5, highlighting the gap between theoretical capacity and practical use.
Contribution
It provides a comparative analysis of GPT-5 and other models on long-context tasks, demonstrating improvements in handling large inputs and addressing the 'lost in the middle' problem.
Findings
Performance degrades significantly beyond 5K posts
GPT-5 maintains high precision (~95%) despite accuracy drop
The 'lost in the middle' problem is largely resolved in newer models
Abstract
With the significant expansion of the context window in Large Language Models (LLMs), these models are theoretically capable of processing millions of tokens in a single pass. However, research indicates a significant gap between this theoretical capacity and the practical ability of models to robustly utilize information within long contexts, especially in tasks that require a comprehensive understanding of numerous details. This paper evaluates the performance of four state-of-the-art models (Grok-4, GPT-4, Gemini 2.5, and GPT-5) on long short-context tasks. For this purpose, three datasets were used: two supplementary datasets for retrieving culinary recipes and math problems, and a primary dataset of 20K social media posts for depression detection. The results show that as the input volume on the social media dataset exceeds 5K posts (70K tokens), the performance of all models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Digital Mental Health Interventions · Sentiment Analysis and Opinion Mining
