FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry Wei, Jason Wei,, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc Le, Thang Luong

TL;DR
This paper evaluates the factual accuracy of large language models in dynamic knowledge scenarios, introduces FreshQA benchmark, and proposes FreshPrompt, a retrieval-augmented prompting method that improves model performance on current knowledge questions.
Contribution
It introduces FreshQA benchmark for dynamic knowledge questions and proposes FreshPrompt, a retrieval-based prompting technique that enhances LLM accuracy with up-to-date information.
Findings
Models struggle with fast-changing knowledge and false premises.
FreshPrompt outperforms existing search-augmented prompting methods.
Retrieval quantity and order significantly impact answer correctness.
Abstract
Most large language models (LLMs) are trained once and never updated; thus, they lack the ability to dynamically adapt to our ever-changing world. In this work, we perform a detailed study of the factuality of LLM-generated text in the context of answering questions that test current world knowledge. Specifically, we introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types, including questions that require fast-changing world knowledge as well as questions with false premises that need to be debunked. We benchmark a diverse array of both closed and open-source LLMs under a two-mode evaluation procedure that allows us to measure both correctness and hallucination. Through human evaluations involving more than 50K judgments, we shed light on limitations of these models and demonstrate significant room for improvement: for instance, all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Expert finding and Q&A systems
