AgentEval: Generative Agents as Reliable Proxies for Human Evaluation of AI-Generated Content
Thanh Vu, Richi Nayak, and Thiru Balasubramaniam

TL;DR
This paper presents Generative Agents that reliably simulate human judgment to evaluate AI-generated content, reducing costs and time compared to traditional human assessments.
Contribution
It introduces a novel automated evaluation method using Generative Agents to accurately assess AI content quality, streamlining the evaluation process.
Findings
Agents effectively mimic human ratings on content quality
Evaluation process is faster and more cost-efficient
Supports improved content generation for business use
Abstract
Modern businesses are increasingly challenged by the time and expense required to generate and assess high-quality content. Human writers face time constraints, and extrinsic evaluations can be costly. While Large Language Models (LLMs) offer potential in content creation, concerns about the quality of AI-generated content persist. Traditional evaluation methods, like human surveys, further add operational costs, highlighting the need for efficient, automated solutions. This research introduces Generative Agents as a means to tackle these challenges. These agents can rapidly and cost-effectively evaluate AI-generated content, simulating human judgment by rating aspects such as coherence, interestingness, clarity, fairness, and relevance. By incorporating these agents, businesses can streamline content generation and ensure consistent, high-quality output while minimizing reliance on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Artificial Intelligence in Healthcare and Education · Topic Modeling
