What Is Actually Being Annotated? Inter-Prompt Reliability as a Measurement Problem in LLM-Based Social Science Labeling

Jingyuan Liu

arXiv:2604.16413·cs.CY·April 21, 2026

What Is Actually Being Annotated? Inter-Prompt Reliability as a Measurement Problem in LLM-Based Social Science Labeling

Jingyuan Liu

PDF

TL;DR

This paper introduces Inter-Prompt Reliability (IPR), a framework to measure the stability of LLM annotations across varied prompts, revealing significant stochastic variation especially in interpretative tasks and advocating for prompt aggregation.

Contribution

The paper proposes the IPR framework to evaluate LLM annotation reliability across different prompts, highlighting the importance of prompt aggregation for reproducibility in social science research.

Findings

01

LLM annotations show high stochastic variation in interpretative tasks.

02

Majority voting across prompts improves reproducibility.

03

Prompt wording introduces methodological uncertainty.

Abstract

Large language models (LLMs) are increasingly used for annotation in computational social science, yet their methodological reliability under prompt variation remains unclear. This paper introduces Inter-Prompt Reliability (IPR), a framework for evaluating the stability of LLM outputs across semantically equivalent but linguistically varied prompts. Drawing on Inter-Rater Reliability, IPR is measured by Pairwise Agreement Rate (PAR) and its distribution to capture both consistency and stochasticity in model behavior. We evaluate this framework on two tasks with distinct properties: TREC (interpretative) and Politifact (knowledge-anchored). Results show that LLM annotation exhibits substantial stochastic variation in interpretative tasks, while appearing more stable in knowledge-based tasks. We further show that majority voting across prompts significantly improves reproducibility and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.