Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

Jinhyuk Lee; Anthony Chen; Zhuyun Dai; Dheeru Dua; Devendra Singh; Sachan; Michael Boratko; Yi Luan; S\'ebastien M. R. Arnold; Vincent Perot,; Siddharth Dalmia; Hexiang Hu; Xudong Lin; Panupong Pasupat; Aida Amini,; Jeremy R. Cole; Sebastian Riedel; Iftekhar Naim; Ming-Wei Chang; Kelvin Guu

arXiv:2406.13121·cs.CL·June 21, 2024·3 cites

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh, Sachan, Michael Boratko, Yi Luan, S\'ebastien M. R. Arnold, Vincent Perot,, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini,, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang

PDF

Open Access 1 Repo

TL;DR

Long-context language models can potentially replace traditional retrieval and reasoning systems by processing extensive information directly, as demonstrated by the LOFT benchmark showing competitive performance and highlighting areas for improvement.

Contribution

This paper introduces LOFT, a new benchmark for evaluating long-context language models on real-world tasks with extensive context, and demonstrates their surprising ability to rival specialized systems.

Findings

01

LCLMs can match state-of-the-art retrieval and RAG systems without explicit training.

02

Performance is heavily influenced by prompting strategies.

03

Challenges remain in compositional reasoning tasks like SQL.

Abstract

Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-end modeling that minimizes cascading errors in complex pipelines, and allows for the application of sophisticated prompting techniques across the entire system. To assess this paradigm shift, we introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-deepmind/loft
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Residual Connection · Weight Decay · Softmax · Layer Normalization · Byte Pair Encoding · Attention Dropout · Linear Warmup With Linear Decay