Measuring Attribution in Natural Language Generation Models

Hannah Rashkin; Vitaly Nikolaev; Matthew Lamm; Lora Aroyo; Michael; Collins; Dipanjan Das; Slav Petrov; Gaurav Singh Tomar; Iulia Turc; David; Reitter

arXiv:2112.12870·cs.CL·August 4, 2022·34 cites

Measuring Attribution in Natural Language Generation Models

Hannah Rashkin, Vitaly Nikolaev, Matthew Lamm, Lora Aroyo, Michael, Collins, Dipanjan Das, Slav Petrov, Gaurav Singh Tomar, Iulia Turc, David, Reitter

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces AIS, a new evaluation framework for assessing whether natural language generation models produce output supported by verifiable external sources, validated through human studies across multiple datasets.

Contribution

The paper presents AIS, a novel two-stage annotation pipeline and evaluation framework for measuring attribution in NLG outputs related to external sources.

Findings

01

AIS correlates well with human judgments of source support

02

Validated across conversational QA, summarization, and table-to-text datasets

03

Provides a standardized approach for attribution evaluation in NLG

Abstract

With recent improvements in natural language generation (NLG) models for various applications, it has become imperative to have the means to identify and evaluate whether NLG output is only sharing verifiable information about the external world. In this work, we present a new evaluation framework entitled Attributable to Identified Sources (AIS) for assessing the output of natural language generation models, when such output pertains to the external world. We first define AIS and introduce a two-stage annotation pipeline for allowing annotators to appropriately evaluate model output according to AIS guidelines. We empirically validate this approach on generation datasets spanning three tasks (two conversational QA datasets, a summarization dataset, and a table-to-text dataset) via human evaluation studies that suggest that AIS could serve as a common framework for measuring whether…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research-datasets/attributed-qa
tf

Datasets

GEM/TaTA
dataset· 22 dl
22 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems