Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study
Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad, Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, Anima Anandkumar,, Bryan Catanzaro

TL;DR
This paper conducts a comprehensive study on pretraining large autoregressive language models with retrieval, demonstrating improved text generation quality and downstream task performance compared to standard GPT, and introduces RETRO++ for enhanced question answering.
Contribution
It provides a scalable recipe for pretraining retrieval-augmented LMs like RETRO and introduces RETRO++, a variant that significantly improves open-domain QA results.
Findings
RETRO outperforms GPT in text generation quality and factual accuracy.
RETRO largely outperforms GPT on knowledge-intensive tasks.
RETRO++ significantly improves open-domain QA performance.
Abstract
Large decoder-only language models (LMs) can be largely improved in terms of perplexity by retrieval (e.g., RETRO), but its impact on text generation quality and downstream task accuracy is unclear. Thus, it is still an open question: shall we pretrain large autoregressive LMs with retrieval? To answer it, we perform a comprehensive study on a scalable pre-trained retrieval-augmented LM (i.e., RETRO) compared with standard GPT and retrieval-augmented GPT incorporated at fine-tuning or inference stages. We first provide the recipe to reproduce RETRO up to 9.5B parameters while retrieving a text corpus with 330B tokens. Based on that, we have the following novel findings: i) RETRO outperforms GPT on text generation with much less degeneration (i.e., repetition), moderately higher factual accuracy, and slightly lower toxicity with a nontoxic retrieval database. ii) On the LM Evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Residual Connection · Cosine Annealing · Softmax · Linear Layer · Byte Pair Encoding · Layer Normalization · Linear Warmup With Cosine Annealing · Dense Connections
