Evaluation of Faithfulness Using the Longest Supported Subsequence

Anirudh Mittal; Timo Schick; Mikel Artetxe; Jane Dwivedi-Yu

arXiv:2308.12157·cs.CL·August 24, 2023

Evaluation of Faithfulness Using the Longest Supported Subsequence

Anirudh Mittal, Timo Schick, Mikel Artetxe, Jane Dwivedi-Yu

PDF

Open Access

TL;DR

This paper introduces a new metric called Longest Supported Subsequence (LSS) to evaluate the faithfulness of machine-generated text, demonstrating improved correlation with human judgments and outperforming existing metrics across multiple datasets and models.

Contribution

The paper proposes a novel LSS-based metric for faithfulness evaluation, supported by a new dataset and a fine-tuned model, enhancing accuracy over previous methods.

Findings

01

LSS metric correlates better with human ratings.

02

LSS improves faithfulness evaluation by 18% over state-of-the-art.

03

The metric outperforms others across six summarization models.

Abstract

As increasingly sophisticated language models emerge, their trustworthiness becomes a pivotal issue, especially in tasks such as summarization and question-answering. Ensuring their responses are contextually grounded and faithful is challenging due to the linguistic diversity and the myriad of possible answers. In this paper, we introduce a novel approach to evaluate faithfulness of machine-generated text by computing the longest noncontinuous substring of the claim that is supported by the context, which we refer to as the Longest Supported Subsequence (LSS). Using a new human-annotated dataset, we finetune a model to generate LSS. We introduce a new method of evaluation and demonstrate that these metrics correlate better with human ratings when LSS is employed, as opposed to when it is not. Our proposed metric demonstrates an 18% enhancement over the prevailing state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification