Scalable and consistent few-shot classification of survey responses using text embeddings
Jonas Timmann Mjaaland, Markus Fleten Kreutzer, Halvor Tyseng, Rebeckah K. Fussell, Gina Passante, N.G. Holmes, Anders Malthe-S{\o}renssen, and Tor Ole B. Odden

TL;DR
This paper presents a scalable, consistent, and minimally supervised text embedding-based framework for classifying open-ended survey responses, aligning well with qualitative research workflows and achieving high agreement with human coders.
Contribution
It introduces a novel classification framework using text embeddings that requires minimal labeled data and integrates seamlessly with existing qualitative analysis methods.
Findings
Achieves Cohen's Kappa of 0.74 to 0.83 compared to human coders.
Performance improves with fine-tuning of embedding models.
Enables auditing and scaling of qualitative datasets.
Abstract
Qualitative analysis of open-ended survey responses is a commonly-used research method in the social sciences, but traditional coding approaches are often time-consuming and prone to inconsistency. Existing solutions from Natural Language Processing such as supervised classifiers, topic modeling techniques, and generative large language models have limited applicability in qualitative analysis, since they demand extensive labeled data, disrupt established qualitative workflows, and/or yield variable results. In this paper, we introduce a text embedding-based classification framework that requires only a handful of examples per category and fits well with standard qualitative workflows. When benchmarked against human analysis of a conceptual physics survey consisting of 2899 open-ended responses, our framework achieves a Cohen's Kappa ranging from 0.74 to 0.83 as compared to expert human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
