From Artificial Needles to Real Haystacks: Improving Retrieval   Capabilities in LLMs by Finetuning on Synthetic Data

Zheyang Xiong; Vasilis Papageorgiou; Kangwook Lee; Dimitris; Papailiopoulos

arXiv:2406.19292·cs.LG·October 15, 2024·1 cites

From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data

Zheyang Xiong, Vasilis Papageorgiou, Kangwook Lee, Dimitris, Papailiopoulos

PDF

Open Access 1 Repo

TL;DR

This paper introduces a finetuning method using synthetic data to enhance LLMs' retrieval and reasoning abilities in long-context scenarios, demonstrating significant improvements without degrading performance on general benchmarks.

Contribution

The study presents a novel synthetic dataset for finetuning LLMs, significantly improving long-context retrieval and reasoning capabilities while maintaining overall benchmark performance.

Findings

01

10.5% improvement on MDQA with 20 documents at position 10 for GPT-3.5 Turbo

02

Finetuning on synthetic data does not cause hallucinations or performance drops on benchmarks like TriviaQA

03

Synthetic data-based finetuning enhances long-context task performance without harming general abilities.

Abstract

Recent studies have shown that Large Language Models (LLMs) struggle to accurately retrieve information and maintain reasoning capabilities when processing long-context inputs. To address these limitations, we propose a finetuning approach utilizing a carefully designed synthetic dataset comprising numerical key-value retrieval tasks. Our experiments on models like GPT-3.5 Turbo and Mistral 7B demonstrate that finetuning LLMs on this dataset significantly improves LLMs' information retrieval and reasoning capabilities in longer-context settings. We present an analysis of the finetuned models, illustrating the transfer of skills from synthetic to real task evaluations (e.g., $10.5%$ improvement on $20$ documents MDQA at position $10$ for GPT-3.5 Turbo). We also find that finetuned LLMs' performance on general benchmarks remains almost constant while LLMs finetuned on other baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

edixiong/artificial-needles
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Residual Connection · Multi-Head Attention · Weight Decay · Softmax · Layer Normalization