Evaluating the Evaluation of Diversity in Commonsense Generation

Tianhui Zhang; Bei Peng; Danushka Bollegala

arXiv:2506.00514·cs.CL·June 3, 2025

Evaluating the Evaluation of Diversity in Commonsense Generation

Tianhui Zhang, Bei Peng, Danushka Bollegala

PDF

Open Access

TL;DR

This paper systematically evaluates various diversity metrics for commonsense generation, revealing that content-based metrics align better with human judgments and are more reliable than form-based metrics, which tend to overestimate diversity.

Contribution

It provides a comprehensive meta-evaluation of diversity metrics, introduces an LLM-annotated dataset, and recommends content-based metrics for future evaluations.

Findings

01

Form-based metrics overestimate diversity.

02

Content-based metrics correlate better with LLM ratings.

03

Content metrics are more reliable for diversity evaluation.

Abstract

In commonsense generation, given a set of input concepts, a model must generate a response that is not only commonsense bearing, but also capturing multiple diverse viewpoints. Numerous evaluation metrics based on form- and content-level overlap have been proposed in prior work for evaluating the diversity of a commonsense generation model. However, it remains unclear as to which metrics are best suited for evaluating the diversity in commonsense generation. To address this gap, we conduct a systematic meta-evaluation of diversity metrics for commonsense generation. We find that form-based diversity metrics tend to consistently overestimate the diversity in sentence sets, where even randomly generated sentences are assigned overly high diversity scores. We then use an Large Language Model (LLM) to create a novel dataset annotated for the diversity of sentences generated for a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsViral Infectious Diseases and Gene Expression in Insects