AI Agents-as-Judge: Automated Assessment of Accuracy, Consistency, Completeness and Clarity for Enterprise Documents

Sudip Dasgupta; Himanshu Shankar

arXiv:2506.22485·cs.CL·July 1, 2025

AI Agents-as-Judge: Automated Assessment of Accuracy, Consistency, Completeness and Clarity for Enterprise Documents

Sudip Dasgupta, Himanshu Shankar

PDF

Open Access

TL;DR

This paper introduces a multi-agent AI system for automated, section-by-section review of enterprise documents, achieving high accuracy, consistency, and efficiency, surpassing human performance in key metrics.

Contribution

The study presents a novel modular multi-agent framework utilizing modern orchestration tools for comprehensive enterprise document assessment, with standardized outputs and iterative human-in-the-loop improvements.

Findings

01

Achieves 99% information consistency, surpassing 92% of humans.

02

Halves error and bias rates compared to manual reviews.

03

Reduces review time from 30 to 2.5 minutes per document.

Abstract

This study presents a modular, multi-agent system for the automated review of highly structured enterprise business documents using AI agents. Unlike prior solutions focused on unstructured texts or limited compliance checks, this framework leverages modern orchestration tools such as LangChain, CrewAI, TruLens, and Guidance to enable section-by-section evaluation of documents for accuracy, consistency, completeness, and clarity. Specialized agents, each responsible for discrete review criteria such as template compliance or factual correctness, operate in parallel or sequence as required. Evaluation outputs are enforced to a standardized, machine-readable schema, supporting downstream analytics and auditability. Continuous monitoring and a feedback loop with human reviewers allow for iterative system improvement and bias mitigation. Quantitative evaluation demonstrates that the AI…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Multi-Agent Systems and Negotiation · Topic Modeling