What Works for 'Lost-in-the-Middle' in LLMs? A Study on GM-Extract and Mitigations
Mihir Gupte, Eshan Dixit, Muhammad Tayyab, Arun Adiththan

TL;DR
This paper investigates the 'lost-in-the-middle' challenge in large language models, introduces GM-Extract for evaluation, and analyzes various mitigation strategies' effectiveness in real-world retrieval tasks.
Contribution
It presents GM-Extract, a new benchmark dataset, and provides a systematic evaluation of model performance and mitigation techniques for the 'lost-in-the-middle' phenomenon.
Findings
Performance varies significantly with data representation.
Mitigation methods have nuanced and sometimes negative effects.
Model performance correlates with perplexity scores.
Abstract
The diminishing ability of large language models (LLMs) to effectively utilize long-range context-the "lost-in-the-middle" phenomenon-poses a significant challenge in retrieval-based LLM applications. To study the impact of this phenomenon in a real-world application setting, we introduce GM-Extract, a novel benchmark dataset meticulously designed to evaluate LLM performance on retrieval of control variables. To accurately diagnose failure modes, we propose a simple yet elegant evaluation system using two distinct metrics: one for spatial retrieval capability (Document Metric) and the other for semantic retrieval capability (Variable Extraction Metric). We conduct a systematic evaluation of 7-8B parameter models on two multi-document tasks (key-value extraction and question-answering), demonstrating a significant change in retrieval performance simply by altering how the data is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Biomedical Text Mining and Ontologies
