Text Annotation via Inductive Coding: Comparing Human Experts to LLMs in Qualitative Data Analysis
Angelina Parfenova, Andreas Marfurt, Alexander Denzler, Juergen Pfeffer

TL;DR
This study compares human experts and large language models in inductive coding for qualitative data analysis, revealing contrasting strengths and weaknesses in labeling complex versus simple data, and examining alignment with gold standards.
Contribution
It introduces a systematic comparison of LLMs and humans in inductive coding, highlighting differences in performance and evaluation in qualitative data analysis.
Findings
Humans excel at labeling complex sentences.
LLMs perform better on simpler sentences.
Human labels are often rated more favorably despite deviations.
Abstract
This paper investigates the automation of qualitative data analysis, focusing on inductive coding using large language models (LLMs). Unlike traditional approaches that rely on deductive methods with predefined labels, this research investigates the inductive process where labels emerge from the data. The study evaluates the performance of six open-source LLMs compared to human experts. As part of the evaluation, experts rated the perceived difficulty of the quotes they coded. The results reveal a peculiar dichotomy: human coders consistently perform well when labeling complex sentences but struggle with simpler ones, while LLMs exhibit the opposite trend. Additionally, the study explores systematic deviations in both human and LLM generated labels by comparing them to the golden standard from the test set. While human annotations may sometimes differ from the golden standard, they are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsQualitative Research Methods and Applications · Computational and Text Analysis Methods · Sentiment Analysis and Opinion Mining
