Label Curation Using Agentic AI
Subhodeep Ghosh, Bayan Divaaniaazar, Md Ishat-E-Rabban, Spencer Clarke, Senjuti Basu Roy

TL;DR
AURA is an agentic AI framework that improves large-scale, multi-modal data annotation by jointly inferring true labels and annotator reliability without ground truth, significantly enhancing accuracy and reliability.
Contribution
It introduces AURA, a novel agentic AI system that automates and enhances data annotation by modeling annotator reliability and aggregating noisy labels without ground truth.
Findings
Achieves up to 5.8% accuracy improvement over baselines.
Up to 50% improvement in challenging scenarios with poor annotators.
Accurately estimates annotator reliability without pre-validation.
Abstract
Data annotation is essential for supervised learning, yet producing accurate, unbiased, and scalable labels remains challenging as datasets grow in size and modality. Traditional human-centric pipelines are costly, slow, and prone to annotator variability, motivating reliability-aware automated annotation. We present AURA (Agentic AI for Unified Reliability Modeling and Annotation Aggregation), an agentic AI framework for large-scale, multi-modal data annotation. AURA coordinates multiple AI agents to generate and validate labels without requiring ground truth. At its core, AURA adapts a classical probabilistic model that jointly infers latent true labels and annotator reliability via confusion matrices, using Expectation-Maximization to reconcile conflicting annotations and aggregate noisy predictions. Across the four benchmark datasets evaluated, AURA achieves accuracy improvements of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Machine Learning and Data Classification · Reliability and Agreement in Measurement
