Architectural Vulnerability and Reliability Challenges in AI Text Annotation: A Survey-Inspired Framework with Independent Probability Assessment
Linzhuo li

TL;DR
This paper identifies fundamental architectural vulnerabilities in large language models caused by order sensitivity, demonstrating their impact on annotation reliability and proposing an independent probability assessment method to improve robustness.
Contribution
It introduces a survey-inspired framework and a novel reliability measure (R-score) to address positional bias in LLM annotations, advancing beyond traditional accuracy metrics.
Findings
Architectural constraints cause inconsistent annotations under perturbations.
Independent Probability Assessment reduces positional bias.
Order-sensitive annotations significantly affect downstream social science conclusions.
Abstract
Large Language Models, despite their power, have a fundamental architectural vulnerability stemming from their causal transformer design -- order sensitivity. This architectural constraint may distorts classification outcomes when prompt elements like label options are reordered, revealing a theoretical gap between accuracy metrics and true model reliability. The paper conceptualizes this vulnerability through the lens of survey methodology, where respondent biases parallel LLM positional dependencies. Empirical evidence using the F1000 biomedical dataset across three scales of LLaMA3.1 models (8B, 70B, 405B) demonstrates that these architectural constraints produce inconsistent annotations under controlled perturbations. The paper advances a practical solution for social science - Independent Probability Assessment - which decouples label evaluation to circumvent positional bias…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques
