Zero-shot Factual Consistency Evaluation Across Domains

Raunak Agarwal

arXiv:2408.04114·cs.CL·August 9, 2024

Zero-shot Factual Consistency Evaluation Across Domains

Raunak Agarwal

PDF

Open Access 1 Repo 3 Models

TL;DR

This paper introduces a unified model for evaluating factual consistency in text generation across multiple domains, achieving state-of-the-art results on diverse benchmarks and addressing efficiency and generalization challenges.

Contribution

It unifies various factual evaluation tasks into a single model trained on multiple datasets, improving cross-domain performance and efficiency.

Findings

01

Achieves state-of-the-art performance on 22 datasets

02

Demonstrates strong cross-domain generalization

03

Addresses efficiency concerns in factual evaluation

Abstract

This work addresses the challenge of factual consistency in text generation systems. We unify the tasks of Natural Language Inference, Summarization Evaluation, Factuality Verification and Factual Consistency Evaluation to train models capable of evaluating the factual consistency of source-target pairs across diverse domains. We rigorously evaluate these against eight baselines on a comprehensive benchmark suite comprising 22 datasets that span various tasks, domains, and document lengths. Results demonstrate that our method achieves state-of-the-art performance on this heterogeneous benchmark while addressing efficiency concerns and attaining cross-domain generalization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raunak-agarwal/factual-consistency-eval
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Software Engineering Research · Topic Modeling