The AI Committee: A Multi-Agent Framework for Automated Validation and Remediation of Web-Sourced Data

Sunith Vallabhaneni; Thomas Berkane; Maimuna Majumder

arXiv:2512.21481·cs.MA·December 29, 2025

The AI Committee: A Multi-Agent Framework for Automated Validation and Remediation of Web-Sourced Data

Sunith Vallabhaneni, Thomas Berkane, Maimuna Majumder

PDF

Open Access

TL;DR

The paper introduces the AI Committee, a multi-agent system utilizing large language models to automate validation and remediation of web-sourced datasets, significantly improving data quality and reducing manual effort.

Contribution

It presents a novel, model-agnostic multi-agent framework that automates data validation and correction without task-specific training, outperforming baseline methods.

Findings

01

Achieves up to 78.7% data completeness

02

Attains 100% precision in data validation

03

Generalizes across different LLMs

Abstract

Many research areas rely on data from the web to gain insights and test their methods. However, collecting comprehensive research datasets often demands manually reviewing many web pages to identify and record relevant data points, which is labor-intensive and susceptible to error. While the emergence of large language models (LLM)-powered web agents has begun to automate parts of this process, they often struggle to ensure the validity of the data they collect. Indeed, these agents exhibit several recurring failure modes - including hallucinating or omitting values, misinterpreting page semantics, and failing to detect invalid information - which are subtle and difficult to detect and correct manually. To address this, we introduce the AI Committee, a novel model-agnostic multi-agent system that automates the process of validating and remediating web-sourced datasets. Each agent is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Web Data Mining and Analysis