Towards Faithful and Robust LLM Specialists for Evidence-Based   Question-Answering

Tobias Schimanski; Jingwei Ni; Mathias Kraus; Elliott Ash; Markus; Leippold

arXiv:2402.08277·cs.CL·June 4, 2024·2 cites

Towards Faithful and Robust LLM Specialists for Evidence-Based Question-Answering

Tobias Schimanski, Jingwei Ni, Mathias Kraus, Elliott Ash, Markus, Leippold

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper develops a systematic approach to improve the faithfulness and robustness of Large Language Models in Evidence-Based Question-Answering by fine-tuning with high-quality synthetic data and benchmarking their performance.

Contribution

It introduces a novel data generation pipeline with quality filters and four benchmark test sets for evaluating LLMs in Evidence-Based QA.

Findings

01

Fine-tuning with synthetic high-quality data enhances model performance

02

Data quality has a greater impact than data quantity on model accuracy

03

Models show improved robustness on both in- and out-of-distribution data

Abstract

Advances towards more faithful and traceable answers of Large Language Models (LLMs) are crucial for various research and practical endeavors. One avenue in reaching this goal is basing the answers on reliable sources. However, this Evidence-Based QA has proven to work insufficiently with LLMs in terms of citing the correct sources (source quality) and truthfully representing the information within sources (answer attributability). In this work, we systematically investigate how to robustly fine-tune LLMs for better source quality and answer attributability. Specifically, we introduce a data generation pipeline with automated data quality filters, which can synthesize diversified high-quality training and testing data at scale. We further introduce four test sets to benchmark the robustness of fine-tuned specialist models. Extensive evaluation shows that fine-tuning on synthetic data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EdisonNi-hku/Robust_Evidence_Based_QA
pytorchOfficial

Videos

Towards Faithful and Robust LLM Specialists for Evidence-Based Question-Answering· underline

Taxonomy

TopicsExpert finding and Q&A systems · Topic Modeling