From Quotes to Concepts: Axial Coding of Political Debates with Ensemble LMs
Angelina Parfenova, David Graus, Juergen Pfeffer

TL;DR
This paper operationalizes axial coding of political debates using large language models, transforming transcripts into hierarchical, concise categories with two clustering strategies and extensive evaluation.
Contribution
It introduces a novel LLM-based axial coding method with two strategies, enhancing qualitative analysis of debate transcripts and providing a publicly available dataset.
Findings
Clustering achieves higher coverage and structural separation.
LLM grouping yields more interpretable, semantically aligned categories.
Trade-off observed between coverage and fine-grained alignment.
Abstract
Axial coding is a commonly used qualitative analysis method that enhances document understanding by organizing sentence-level open codes into broader categories. In this paper, we operationalize axial coding with large language models (LLMs). Extending an ensemble-based open coding approach with an LLM moderator, we add an axial coding step that groups open codes into higher-order categories, transforming raw debate transcripts into concise, hierarchical representations. We compare two strategies: (i) clustering embeddings of code-utterance pairs using density-based and partitioning algorithms followed by LLM labeling, and (ii) direct LLM-based grouping of codes and utterances into categories. We apply our method to Dutch parliamentary debates, converting lengthy transcripts into compact, hierarchically structured codes and categories. We evaluate our method using extrinsic metrics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Sentiment Analysis and Opinion Mining · Topic Modeling
