Architectural Vulnerability and Reliability Challenges in AI Text Annotation: A Survey-Inspired Framework with Independent Probability Assessment

Linzhuo li

arXiv:2502.19679·cs.DL·May 27, 2025

Architectural Vulnerability and Reliability Challenges in AI Text Annotation: A Survey-Inspired Framework with Independent Probability Assessment

Linzhuo li

PDF

Open Access

TL;DR

This paper identifies fundamental architectural vulnerabilities in large language models caused by order sensitivity, demonstrating their impact on annotation reliability and proposing an independent probability assessment method to improve robustness.

Contribution

It introduces a survey-inspired framework and a novel reliability measure (R-score) to address positional bias in LLM annotations, advancing beyond traditional accuracy metrics.

Findings

01

Architectural constraints cause inconsistent annotations under perturbations.

02

Independent Probability Assessment reduces positional bias.

03

Order-sensitive annotations significantly affect downstream social science conclusions.

Abstract

Large Language Models, despite their power, have a fundamental architectural vulnerability stemming from their causal transformer design -- order sensitivity. This architectural constraint may distorts classification outcomes when prompt elements like label options are reordered, revealing a theoretical gap between accuracy metrics and true model reliability. The paper conceptualizes this vulnerability through the lens of survey methodology, where respondent biases parallel LLM positional dependencies. Empirical evidence using the F1000 biomedical dataset across three scales of LLaMA3.1 models (8B, 70B, 405B) demonstrates that these architectural constraints produce inconsistent annotations under controlled perturbations. The paper advances a practical solution for social science - Independent Probability Assessment - which decouples label evaluation to circumvent positional bias…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques