Faithfulness metric fusion: Improving the evaluation of LLM trustworthiness across domains

Ben Malin; Tatiana Kalganova; Nikolaos Boulgouris

arXiv:2512.05700·cs.CL·December 8, 2025

Faithfulness metric fusion: Improving the evaluation of LLM trustworthiness across domains

Ben Malin, Tatiana Kalganova, Nikolaos Boulgouris

PDF

Open Access

TL;DR

This paper introduces a novel fusion-based faithfulness metric for LLMs that combines multiple metrics using a tree model, aligning more closely with human judgments across various domains to enhance trustworthiness evaluation.

Contribution

It proposes a new metric fusion strategy using a tree model driven by human judgments, improving faithfulness evaluation accuracy across multiple domains.

Findings

01

Fused metric correlates more strongly with human judgments.

02

Method improves faithfulness assessment across diverse datasets.

03

Dataset homogenization enables cross-domain evaluation.

Abstract

We present a methodology for improving the accuracy of faithfulness evaluation in Large Language Models (LLMs). The proposed methodology is based on the combination of elementary faithfulness metrics into a combined (fused) metric, for the purpose of improving the faithfulness of LLM outputs. The proposed strategy for metric fusion deploys a tree-based model to identify the importance of each metric, which is driven by the integration of human judgements evaluating the faithfulness of LLM responses. This fused metric is demonstrated to correlate more strongly with human judgements across all tested domains for faithfulness. Improving the ability to evaluate the faithfulness of LLMs, allows for greater confidence to be placed within models, allowing for their implementation in a greater diversity of scenarios. Additionally, we homogenise a collection of datasets across question answering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification