Zero-shot Factual Consistency Evaluation Across Domains
Raunak Agarwal

TL;DR
This paper introduces a unified model for evaluating factual consistency in text generation across multiple domains, achieving state-of-the-art results on diverse benchmarks and addressing efficiency and generalization challenges.
Contribution
It unifies various factual evaluation tasks into a single model trained on multiple datasets, improving cross-domain performance and efficiency.
Findings
Achieves state-of-the-art performance on 22 datasets
Demonstrates strong cross-domain generalization
Addresses efficiency concerns in factual evaluation
Abstract
This work addresses the challenge of factual consistency in text generation systems. We unify the tasks of Natural Language Inference, Summarization Evaluation, Factuality Verification and Factual Consistency Evaluation to train models capable of evaluating the factual consistency of source-target pairs across diverse domains. We rigorously evaluate these against eight baselines on a comprehensive benchmark suite comprising 22 datasets that span various tasks, domains, and document lengths. Results demonstrate that our method achieves state-of-the-art performance on this heterogeneous benchmark while addressing efficiency concerns and attaining cross-domain generalization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security · Software Engineering Research · Topic Modeling
