AI Agents-as-Judge: Automated Assessment of Accuracy, Consistency, Completeness and Clarity for Enterprise Documents
Sudip Dasgupta, Himanshu Shankar

TL;DR
This paper introduces a multi-agent AI system for automated, section-by-section review of enterprise documents, achieving high accuracy, consistency, and efficiency, surpassing human performance in key metrics.
Contribution
The study presents a novel modular multi-agent framework utilizing modern orchestration tools for comprehensive enterprise document assessment, with standardized outputs and iterative human-in-the-loop improvements.
Findings
Achieves 99% information consistency, surpassing 92% of humans.
Halves error and bias rates compared to manual reviews.
Reduces review time from 30 to 2.5 minutes per document.
Abstract
This study presents a modular, multi-agent system for the automated review of highly structured enterprise business documents using AI agents. Unlike prior solutions focused on unstructured texts or limited compliance checks, this framework leverages modern orchestration tools such as LangChain, CrewAI, TruLens, and Guidance to enable section-by-section evaluation of documents for accuracy, consistency, completeness, and clarity. Specialized agents, each responsible for discrete review criteria such as template compliance or factual correctness, operate in parallel or sequence as required. Evaluation outputs are enforced to a standardized, machine-readable schema, supporting downstream analytics and auditability. Continuous monitoring and a feedback loop with human reviewers allow for iterative system improvement and bias mitigation. Quantitative evaluation demonstrates that the AI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Multi-Agent Systems and Negotiation · Topic Modeling
