Variance-Aware LLM Annotation for Strategy Research: Sources, Diagnostics, and a Protocol for Reliable Measurement

Arnaldo Camuffo; Alfonso Gambardella; Saeid Kazemi; Jakub Malachowski; Abhinav Pandey

arXiv:2601.02370·cs.CY·January 21, 2026

Variance-Aware LLM Annotation for Strategy Research: Sources, Diagnostics, and a Protocol for Reliable Measurement

Arnaldo Camuffo, Alfonso Gambardella, Saeid Kazemi, Jakub Malachowski, Abhinav Pandey

PDF

Open Access

TL;DR

This paper identifies sources of variance in LLM-generated annotations for strategy research, demonstrating how design choices impact results and proposing a protocol for reliable, reproducible measurement using LLMs.

Contribution

It introduces a variance-aware protocol for LLM annotation, addressing instability issues and establishing standards for reliable measurement in strategy research.

Findings

01

Minor design choices can shift outcomes by 12-85 percentage points

02

Variance sources threaten reproducibility and bias econometric estimates

03

Proposes a protocol with sampling, aggregation, and reporting standards

Abstract

Large language models (LLMs) offer strategy researchers powerful tools for annotating text at scale, but treating LLM-generated labels as deterministic overlooks substantial instability. Grounded in content analysis and generalizability theory, we diagnose five variance sources: construct specification, interface effects, model preferences, output extraction, and system-level aggregation. Empirical demonstrations show that minor design choices-prompt phrasing, model selection-can shift outcomes by 12-85 percentage points. Such variance threatens not only reproducibility but econometric identification: annotation errors correlated with covariates bias parameter estimates regardless of average accuracy. We develop a variance-aware protocol specifying sampling budgets, aggregation rules, and reporting standards, and delineate scope conditions where LLM annotation should not be used. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Forecasting Techniques and Applications · Machine Learning in Materials Science