SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for   reference-free open-ended text

Reshmi Ghosh; Tianyi Yao; Lizzy Chen; Sadid Hasan; Tianwei Chen; Dario; Bernal; Huitian Jiao; H M Sajjad Hossain

arXiv:2411.16077·cs.CL·November 26, 2024

SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text

Reshmi Ghosh, Tianyi Yao, Lizzy Chen, Sadid Hasan, Tianwei Chen, Dario, Bernal, Huitian Jiao, H M Sajjad Hossain

PDF

Open Access

TL;DR

SAGEval introduces a novel reference-free evaluation framework for natural language generation that employs a critiquing agent to improve scoring accuracy without relying on ground-truth labels, especially in complex NLG tasks.

Contribution

The paper presents SAGEval, a new framework that enhances LLM-based NLG evaluation by using a critiquing agent to correct scores without needing reference data.

Findings

01

Critiquing agent effectively rectifies LLM evaluator scores.

02

SAGEval reduces dependence on labeled data for complex NLG tasks.

03

Improves evaluation accuracy in reference-free NLG scenarios.

Abstract

Large Language Model (LLM) integrations into applications like Microsoft365 suite and Google Workspace for creating/processing documents, emails, presentations, etc. has led to considerable enhancements in productivity and time savings. But as these integrations become more more complex, it is paramount to ensure that the quality of output from the LLM-integrated applications are relevant and appropriate for use. Identifying the need to develop robust evaluation approaches for natural language generation, wherein references/ground labels doesn't exist or isn't amply available, this paper introduces a novel framework called "SAGEval" which utilizes a critiquing Agent to provide feedback on scores generated by LLM evaluators. We show that the critiquing Agent is able to rectify scores from LLM evaluators, in absence of references/ground-truth labels, thereby reducing the need for labeled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Topic Modeling · Speech and dialogue systems