PPLqa: An Unsupervised Information-Theoretic Quality Metric for   Comparing Generative Large Language Models

Gerald Friedland; Xin Huang; Yueying Cui; Vishaal Kapoor; Ashish; Khetan; Sanjiv Das

arXiv:2411.15320·cs.CL·November 26, 2024

PPLqa: An Unsupervised Information-Theoretic Quality Metric for Comparing Generative Large Language Models

Gerald Friedland, Xin Huang, Yueying Cui, Vishaal Kapoor, Ashish, Khetan, Sanjiv Das

PDF

Open Access

TL;DR

PPLqa is an unsupervised, language-independent information-theoretic metric for evaluating the quality of responses from large language models, enabling model ranking without ground truth annotations.

Contribution

It introduces PPLqa, a novel unsupervised metric that assesses LLM response quality, correlates with human judgments, and works effectively for long-form Q&A tasks.

Findings

01

PPLqa performs comparably to existing metrics.

02

It works better with long-form responses.

03

It bypasses the need for ground truth annotations.

Abstract

We propose PPLqa, an easy to compute, language independent, information-theoretic metric to measure the quality of responses of generative Large Language Models (LLMs) in an unsupervised way, without requiring ground truth annotations or human supervision. The method and metric enables users to rank generative language models for quality of responses, so as to make a selection of the best model for a given task. Our single metric assesses LLMs with an approach that subsumes, but is not explicitly based on, coherence and fluency (quality of writing) and relevance and consistency (appropriateness of response) to the query. PPLqa performs as well as other related metrics, and works better with long-form Q\&A. Thus, PPLqa enables bypassing the lengthy annotation process required for ground truth evaluations, and it also correlates well with human and LLM rankings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies