Scalable and consistent few-shot classification of survey responses using text embeddings

Jonas Timmann Mjaaland; Markus Fleten Kreutzer; Halvor Tyseng; Rebeckah K. Fussell; Gina Passante; N.G. Holmes; Anders Malthe-S{\o}renssen; and Tor Ole B. Odden

arXiv:2508.19836·cs.CL·August 28, 2025

Scalable and consistent few-shot classification of survey responses using text embeddings

Jonas Timmann Mjaaland, Markus Fleten Kreutzer, Halvor Tyseng, Rebeckah K. Fussell, Gina Passante, N.G. Holmes, Anders Malthe-S{\o}renssen, and Tor Ole B. Odden

PDF

TL;DR

This paper presents a scalable, consistent, and minimally supervised text embedding-based framework for classifying open-ended survey responses, aligning well with qualitative research workflows and achieving high agreement with human coders.

Contribution

It introduces a novel classification framework using text embeddings that requires minimal labeled data and integrates seamlessly with existing qualitative analysis methods.

Findings

01

Achieves Cohen's Kappa of 0.74 to 0.83 compared to human coders.

02

Performance improves with fine-tuning of embedding models.

03

Enables auditing and scaling of qualitative datasets.

Abstract

Qualitative analysis of open-ended survey responses is a commonly-used research method in the social sciences, but traditional coding approaches are often time-consuming and prone to inconsistency. Existing solutions from Natural Language Processing such as supervised classifiers, topic modeling techniques, and generative large language models have limited applicability in qualitative analysis, since they demand extensive labeled data, disrupt established qualitative workflows, and/or yield variable results. In this paper, we introduce a text embedding-based classification framework that requires only a handful of examples per category and fits well with standard qualitative workflows. When benchmarked against human analysis of a conceptual physics survey consisting of 2899 open-ended responses, our framework achieves a Cohen's Kappa ranging from 0.74 to 0.83 as compared to expert human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.