A Comparative Analysis of Faithfulness Metrics and Humans in Citation Evaluation
Weijia Zhang, Mohammad Aliannejadi, Jiahuan Pei, Yifei Yuan, Jia-Hong, Huang, Evangelos Kanoulas

TL;DR
This paper evaluates the effectiveness of faithfulness metrics in distinguishing different levels of citation support in LLM-generated content, revealing current metrics' limitations and guiding future improvements.
Contribution
It introduces a comprehensive framework for assessing faithfulness metrics in fine-grained citation support scenarios, highlighting their inconsistent performance across support levels.
Findings
No single metric outperforms others across all evaluations.
Current metrics struggle to differentiate partial support from full or no support.
The evaluation framework provides a more nuanced understanding of metric effectiveness.
Abstract
Large language models (LLMs) often generate content with unsupported or unverifiable content, known as "hallucinations." To address this, retrieval-augmented LLMs are employed to include citations in their content, grounding the content in verifiable sources. Despite such developments, manually assessing how well a citation supports the associated statement remains a major challenge. Previous studies tackle this challenge by leveraging faithfulness metrics to estimate citation support automatically. However, they limit this citation support estimation to a binary classification scenario, neglecting fine-grained citation support in practical scenarios. To investigate the effectiveness of faithfulness metrics in fine-grained scenarios, we propose a comparative evaluation framework that assesses the metric effectiveness in distinguishing citations between three-category support levels:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducation and Islamic Studies · Religion, Society, and Development · Religion, Spirituality, and Psychology
