Faithfulness metric fusion: Improving the evaluation of LLM trustworthiness across domains
Ben Malin, Tatiana Kalganova, Nikolaos Boulgouris

TL;DR
This paper introduces a novel fusion-based faithfulness metric for LLMs that combines multiple metrics using a tree model, aligning more closely with human judgments across various domains to enhance trustworthiness evaluation.
Contribution
It proposes a new metric fusion strategy using a tree model driven by human judgments, improving faithfulness evaluation accuracy across multiple domains.
Findings
Fused metric correlates more strongly with human judgments.
Method improves faithfulness assessment across diverse datasets.
Dataset homogenization enables cross-domain evaluation.
Abstract
We present a methodology for improving the accuracy of faithfulness evaluation in Large Language Models (LLMs). The proposed methodology is based on the combination of elementary faithfulness metrics into a combined (fused) metric, for the purpose of improving the faithfulness of LLM outputs. The proposed strategy for metric fusion deploys a tree-based model to identify the importance of each metric, which is driven by the integration of human judgements evaluating the faithfulness of LLM responses. This fused metric is demonstrated to correlate more strongly with human judgements across all tested domains for faithfulness. Improving the ability to evaluate the faithfulness of LLMs, allows for greater confidence to be placed within models, allowing for their implementation in a greater diversity of scenarios. Additionally, we homogenise a collection of datasets across question answering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
