Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Kristina Gligori\'c, Tijana Zrnic, Cinoo Lee, Emmanuel J. Cand\`es,, and Dan Jurafsky

TL;DR
This paper introduces Confidence-Driven Inference, a method that combines LLM annotations and confidence indicators to reduce human annotation needs while maintaining valid, accurate statistical estimates in computational social science tasks.
Contribution
The paper presents a novel approach that strategically integrates LLM confidence metrics with annotations to optimize data collection and ensure valid conclusions.
Findings
Reduces human annotations by over 25% in CSS tasks
Guarantees valid and accurate conclusions despite using LLM annotations
Applicable to a broad range of NLP estimation problems
Abstract
Large language models (LLMs) have shown high agreement with human raters across a variety of tasks, demonstrating potential to ease the challenges of human data collection. In computational social science (CSS), researchers are increasingly leveraging LLM annotations to complement slow and expensive human annotations. Still, guidelines for collecting and using LLM annotations, without compromising the validity of downstream conclusions, remain limited. We introduce Confidence-Driven Inference: a method that combines LLM annotations and LLM confidence indicators to strategically select which human annotations should be collected, with the goal of producing accurate statistical estimates and provably valid confidence intervals while reducing the number of human annotations needed. Our approach comes with safeguards against LLM annotations of poor quality, guaranteeing that the conclusions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Library Science and Information Systems
