A Judge-free LLM Open-ended Generation Benchmark Based on the   Distributional Hypothesis

Kentaro Imajo; Masanori Hirano; Shuji Suzuki; Hiroaki Mikami

arXiv:2502.09316·cs.CL·February 14, 2025

A Judge-free LLM Open-ended Generation Benchmark Based on the Distributional Hypothesis

Kentaro Imajo, Masanori Hirano, Shuji Suzuki, Hiroaki Mikami

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel, scalable benchmark for evaluating large language models' open-ended text generation using n-gram statistics and rules, avoiding human or LLM-based judgments.

Contribution

It presents a new benchmark with three metrics—Fluency, Truthfulness, Helpfulness—that correlates well with GPT-4 evaluations but requires fewer resources.

Findings

01

Strong correlation with GPT-4 evaluations

02

Requires significantly less computational resources

03

Effective for scalable assessment of LLMs' open-ended generation

Abstract

Evaluating the open-ended text generation of large language models (LLMs) is challenging because of the lack of a clear ground truth and the high cost of human or LLM-based assessments. We propose a novel benchmark that evaluates LLMs using n-gram statistics and rules, without relying on human judgement or LLM-as-a-judge approaches. Using 50 question and reference answer sets, we introduce three new metrics based on n-grams and rules: Fluency, Truthfulness, and Helpfulness. Our benchmark strongly correlates with GPT-4o-based evaluations while requiring significantly fewer computational resources, demonstrating its effectiveness as a scalable alternative for assessing LLMs' open-ended generation capabilities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pfnet-research/pfgen-bench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security