FACE: A Fine-Grained Reference-Free Evaluator for Conversational Information Access
Hideaki Joko, Faegheh Hasibi

TL;DR
FACE introduces a fine-grained, reference-free evaluation method for conversational information access systems, leveraging LLMs and optimization techniques to produce interpretable scores that strongly correlate with human judgments.
Contribution
It proposes a novel aspect-based evaluation framework that improves correlation with human judgments and offers insights into system performance, surpassing existing methods.
Findings
FACE achieves a system correlation of 0.9 with human judgments.
It outperforms state-of-the-art conversation evaluation methods.
The method's instructions are transferable across LLMs and datasets.
Abstract
A systematic, reliable, and low-cost evaluation of Conversational Information Access (CIA) systems remains an open challenge. Existing reference-based evaluation methods are proven insufficient for evaluating the dynamic nature of information access conversations, while existing LLM-based reference-free methods suffer from evaluation bias and limited generalizability. This work proposes FACE: a Fine-grained, Aspect-based Conversation Evaluation method that provides evaluation scores for diverse turn and dialogue-level aspects of conversations. FACE leverages beam search and bandit optimization to select optimized LLM instructions per evaluation aspect. It assigns scores to atomic information units (particles) using the selected instructions and then aggregates them into a single score. We show that FACE achieves a strong correlation with human judgments, achieving system correlation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
