$T^5Score$: A Methodology for Automatically Assessing the Quality of LLM Generated Multi-Document Topic Sets

Itamar Trainin; Omri Abend

arXiv:2407.17390·cs.CL·May 30, 2025

$T^5Score$: A Methodology for Automatically Assessing the Quality of LLM Generated Multi-Document Topic Sets

Itamar Trainin, Omri Abend

PDF

1 Video

TL;DR

This paper introduces $T^5Score$, a new evaluation methodology for assessing the quality of multi-document topic sets generated by LLMs, addressing the limitations of existing practices and enabling reliable, high-agreement assessments.

Contribution

The paper presents $T^5Score$, a novel, decompositional evaluation framework for LLM-generated topics that improves reliability and inter-annotator agreement.

Findings

01

$T^5Score$ achieves high inter-annotator agreement.

02

It effectively decomposes topic quality into measurable aspects.

03

Experimental results validate its applicability across datasets.

Abstract

Using LLMs for Multi-Document Topic Extraction has recently gained popularity due to their apparent high-quality outputs, expressiveness, and ease of use. However, most existing evaluation practices are not designed for LLM-generated topics and result in low inter-annotator agreement scores, hindering the reliable use of LLMs for the task. To address this, we introduce $T^{5} S cor e$ , an evaluation methodology that decomposes the quality of a topic set into quantifiable aspects, measurable through easy-to-perform annotation tasks. This framing enables a convenient, manual or automatic, evaluation procedure resulting in a strong inter-annotator agreement score. To substantiate our methodology and claims, we perform extensive experimentation on multiple datasets and report the results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

T5Score: A Methodology for Automatically Assessing the Quality of LLM Generated Multi-Document Topic Sets· underline

Taxonomy

MethodsSparse Evolutionary Training