OLAF: Towards Robust LLM-Based Annotation Framework in Empirical Software Engineering
Mia Mohammad Imran, Tarannum Shaila Zaman

TL;DR
This paper introduces OLAF, a framework for improving the reliability, calibration, and transparency of LLM-based annotations in empirical software engineering, emphasizing the need for standardized measurement and reproducibility.
Contribution
It proposes a conceptual framework, OLAF, to organize key constructs for reliable and transparent LLM-based annotation in software engineering research.
Findings
Highlights the importance of reliability and calibration in LLM annotations
Identifies key constructs: reliability, calibration, drift, consensus, transparency
Motivates future empirical studies for standardization and reproducibility
Abstract
Large Language Models (LLMs) are increasingly used in empirical software engineering (ESE) to automate or assist annotation tasks such as labeling commits, issues, and qualitative artifacts. Yet the reliability and reproducibility of such annotations remain underexplored. Existing studies often lack standardized measures for reliability, calibration, and drift, and frequently omit essential configuration details. We argue that LLM-based annotation should be treated as a measurement process rather than a purely automated activity. In this position paper, we outline the \textbf{Operationalization for LLM-based Annotation Framework (OLAF)}, a conceptual framework that organizes key constructs: \textit{reliability, calibration, drift, consensus, aggregation}, and \textit{transparency}. The paper aims to motivate methodological discussion and future empirical work toward more transparent and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Scientific Computing and Data Management
