How Well Do Large Language Models Truly Ground?

Hyunji Lee; Sejune Joo; Chaeeun Kim; Joel Jang; Doyoung Kim,; Kyoung-Woon On; Minjoon Seo

arXiv:2311.09069·cs.CL·July 2, 2024·1 cites

How Well Do Large Language Models Truly Ground?

Hyunji Lee, Sejune Joo, Chaeeun Kim, Joel Jang, Doyoung Kim,, Kyoung-Woon On, Minjoon Seo

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper proposes a stricter definition of grounding for large language models, introduces a new dataset and metric, and evaluates 25 models to better understand and improve their reliability and control.

Contribution

It defines a comprehensive grounding criterion, creates a new dataset and metric, and evaluates diverse LLMs to analyze factors affecting grounding performance.

Findings

01

Grounding performance varies significantly across models.

02

Larger models generally show better grounding capabilities.

03

Certain training methods enhance grounding reliability.

Abstract

To reduce issues like hallucinations and lack of control in Large Language Models (LLMs), a common method is to generate responses by grounding on external contexts given as input, known as knowledge-augmented models. However, previous research often narrowly defines "grounding" as just having the correct answer, which does not ensure the reliability of the entire response. To overcome this, we propose a stricter definition of grounding: a model is truly grounded if it (1) fully utilizes the necessary knowledge from the provided context, and (2) stays within the limits of that knowledge. We introduce a new dataset and a grounding metric to evaluate model capability under the definition. We perform experiments across 25 LLMs of different sizes and training methods and provide insights into factors that influence grounding performance. Our findings contribute to a better understanding of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaistai/how-well-do-llms-truly-ground
pytorchOfficial

Videos

How Well Do Large Language Models Truly Ground?· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification