The AI Committee: A Multi-Agent Framework for Automated Validation and Remediation of Web-Sourced Data
Sunith Vallabhaneni, Thomas Berkane, Maimuna Majumder

TL;DR
The paper introduces the AI Committee, a multi-agent system utilizing large language models to automate validation and remediation of web-sourced datasets, significantly improving data quality and reducing manual effort.
Contribution
It presents a novel, model-agnostic multi-agent framework that automates data validation and correction without task-specific training, outperforming baseline methods.
Findings
Achieves up to 78.7% data completeness
Attains 100% precision in data validation
Generalizes across different LLMs
Abstract
Many research areas rely on data from the web to gain insights and test their methods. However, collecting comprehensive research datasets often demands manually reviewing many web pages to identify and record relevant data points, which is labor-intensive and susceptible to error. While the emergence of large language models (LLM)-powered web agents has begun to automate parts of this process, they often struggle to ensure the validity of the data they collect. Indeed, these agents exhibit several recurring failure modes - including hallucinating or omitting values, misinterpreting page semantics, and failing to detect invalid information - which are subtle and difficult to detect and correct manually. To address this, we introduce the AI Committee, a novel model-agnostic multi-agent system that automates the process of validating and remediating web-sourced datasets. Each agent is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Web Data Mining and Analysis
