Evaluating Correctness and Faithfulness of Instruction-Following Models   for Question Answering

Vaibhav Adlakha; Parishad BehnamGhader; Xing Han Lu; Nicholas Meade,; Siva Reddy

arXiv:2307.16877·cs.CL·April 18, 2024·5 cites

Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering

Vaibhav Adlakha, Parishad BehnamGhader, Xing Han Lu, Nicholas Meade,, Siva Reddy

PDF

Open Access 1 Repo

TL;DR

This paper evaluates instruction-following models for question answering, highlighting their strengths in correctness but also their tendency to hallucinate, and proposes improved metrics for more accurate assessment.

Contribution

It introduces new evaluation metrics for correctness and faithfulness, addressing limitations of traditional metrics, and provides a comprehensive analysis of these models' performance.

Findings

01

Instruction-following models are competitive with fine-tuned models in correctness.

02

Models often hallucinate and deviate from provided knowledge.

03

Proposed metrics better reflect true model performance.

Abstract

Retriever-augmented instruction-following models are attractive alternatives to fine-tuned approaches for information-seeking tasks such as question answering (QA). By simply prepending retrieved documents in its input along with an instruction, these models can be adapted to various information domains and tasks without additional fine-tuning. While the model responses tend to be natural and fluent, the additional verbosity makes traditional QA evaluation metrics such as exact match (EM) and F1 unreliable for accurately quantifying model performance. In this work, we investigate the performance of instruction-following models across three information-seeking QA tasks. We use both automatic and human evaluation to evaluate these models along two dimensions: 1) how well they satisfy the user's information need (correctness), and 2) whether they produce a response based on the provided…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mcgill-nlp/instruct-qa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Information Retrieval and Search Behavior