Wisdom of the AI Crowd (AI-CROWD) for Ground Truth Approximation in Content Analysis: A Research Protocol & Validation Using Eleven Large Language Models

Luis de-Marcos; Manuel Goyanes; Adri\'an Dom\'inguez-D\'iaz

arXiv:2603.06197·cs.CL·March 9, 2026

Wisdom of the AI Crowd (AI-CROWD) for Ground Truth Approximation in Content Analysis: A Research Protocol & Validation Using Eleven Large Language Models

Luis de-Marcos, Manuel Goyanes, Adri\'an Dom\'inguez-D\'iaz

PDF

Open Access

TL;DR

This paper introduces the AI-CROWD protocol, which uses ensemble large language models to approximate ground truth in large-scale content analysis, addressing the impracticality of manual labeling for massive datasets.

Contribution

The paper presents a novel ensemble-based protocol that leverages multiple LLMs to generate consensus labels, providing a scalable alternative to traditional ground truth creation.

Findings

01

AI-CROWD effectively approximates ground truth in large datasets.

02

Consensus and disagreement patterns help identify high-confidence labels.

03

The protocol highlights potential ambiguities and biases in model outputs.

Abstract

Large-scale content analysis is increasingly limited by the absence of observable ground truth or gold-standard labels, as creating such benchmarks through extensive human coding becomes impractical for massive datasets due to high time, cost, and consistency challenges. To overcome this barrier, we introduce the AI-CROWD protocol, which approximates ground truth by leveraging the collective outputs of an ensemble of large language models (LLMs). Rather than asserting that the resulting labels are true ground truth, the protocol generates a consensus-based approximation derived from convergent and divergent inferences across multiple models. By aggregating outputs via majority voting and interrogating agreement/disagreement patterns with diagnostic metrics, AI-CROWD identifies high-confidence classifications while flagging potential ambiguity or model-specific biases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Topic Modeling · Artificial Intelligence in Healthcare and Education