FreshLLMs: Refreshing Large Language Models with Search Engine   Augmentation

Tu Vu; Mohit Iyyer; Xuezhi Wang; Noah Constant; Jerry Wei; Jason Wei,; Chris Tar; Yun-Hsuan Sung; Denny Zhou; Quoc Le; Thang Luong

arXiv:2310.03214·cs.CL·November 23, 2023·6 cites

FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry Wei, Jason Wei,, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc Le, Thang Luong

PDF

Open Access 2 Repos

TL;DR

This paper evaluates the factual accuracy of large language models in dynamic knowledge scenarios, introduces FreshQA benchmark, and proposes FreshPrompt, a retrieval-augmented prompting method that improves model performance on current knowledge questions.

Contribution

It introduces FreshQA benchmark for dynamic knowledge questions and proposes FreshPrompt, a retrieval-based prompting technique that enhances LLM accuracy with up-to-date information.

Findings

01

Models struggle with fast-changing knowledge and false premises.

02

FreshPrompt outperforms existing search-augmented prompting methods.

03

Retrieval quantity and order significantly impact answer correctness.

Abstract

Most large language models (LLMs) are trained once and never updated; thus, they lack the ability to dynamically adapt to our ever-changing world. In this work, we perform a detailed study of the factuality of LLM-generated text in the context of answering questions that test current world knowledge. Specifically, we introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types, including questions that require fast-changing world knowledge as well as questions with false premises that need to be debunked. We benchmark a diverse array of both closed and open-source LLMs under a two-mode evaluation procedure that allows us to measure both correctness and hallucination. Through human evaluations involving more than 50K judgments, we shed light on limitations of these models and demonstrate significant room for improvement: for instance, all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Expert finding and Q&A systems