Superposition Prompting: Improving and Accelerating Retrieval-Augmented   Generation

Thomas Merth; Qichen Fu; Mohammad Rastegari; Mahyar Najibi

arXiv:2404.06910·cs.CL·July 22, 2024·1 cites

Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation

Thomas Merth, Qichen Fu, Mohammad Rastegari, Mahyar Najibi

PDF

Open Access 1 Repo

TL;DR

Superposition prompting is a novel method that enables large language models to process multiple input paths in parallel, reducing inference costs and improving accuracy in retrieval-augmented generation without fine-tuning.

Contribution

It introduces superposition prompting, a new RAG technique that enhances efficiency and accuracy of pre-trained LLMs by parallel processing of prompt paths without fine-tuning.

Findings

01

93x reduction in compute time on NaturalQuestions-Open dataset

02

43% accuracy improvement over naive RAG with MPT-7B

03

Effective across various question-answering benchmarks

Abstract

Despite the successes of large language models (LLMs), they exhibit significant drawbacks, particularly when processing long contexts. Their inference cost scales quadratically with respect to sequence length, making it expensive for deployment in some real-world text processing applications, such as retrieval-augmented generation (RAG). Additionally, LLMs also exhibit the "distraction phenomenon", where irrelevant context in the prompt degrades output quality. To address these drawbacks, we propose a novel RAG prompting methodology, *superposition prompting*, which can be directly applied to pre-trained transformer-based LLMs *without the need for fine-tuning*. At a high level, superposition prompting allows the LLM to process input documents in parallel *prompt paths*, discarding paths once they are deemed irrelevant. We demonstrate the capability of our method to simultaneously…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

apple/ml-superposition-prompting
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Byte Pair Encoding · Linear Layer · Layer Normalization · Weight Decay · Dense Connections · Attention Dropout · Residual Connection