Using Large Language Models to Support Thematic Analysis in Empirical Legal Studies
Jakub Dr\'apal, Hannes Westermann, Jaromir Savelka

TL;DR
This paper presents a framework using GPT-4 to assist legal experts in thematic analysis, improving coding quality and theme discovery in empirical legal studies through collaborative AI-human interaction.
Contribution
It introduces a novel framework for integrating large language models into the thematic analysis process in legal research, enhancing collaboration and efficiency.
Findings
GPT-4 generated reasonable initial codes
Model improved code quality with expert feedback
Themes discovered by GPT-4 aligned well with legal experts' themes
Abstract
Thematic analysis and other variants of inductive coding are widely used qualitative analytic methods within empirical legal studies (ELS). We propose a novel framework facilitating effective collaboration of a legal expert with a large language model (LLM) for generating initial codes (phase 2 of thematic analysis), searching for themes (phase 3), and classifying the data in terms of the themes (to kick-start phase 4). We employed the framework for an analysis of a dataset (n=785) of facts descriptions from criminal court opinions regarding thefts. The goal of the analysis was to discover classes of typical thefts. Our results show that the LLM, namely OpenAI's GPT-4, generated reasonable initial codes, and it was capable of improving the quality of the codes based on expert feedback. They also suggest that the model performed well in zero-shot classification of facts descriptions in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw in Society and Culture · Artificial Intelligence in Law · Legal Education and Practice Innovations
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Layer Normalization · Label Smoothing · Byte Pair Encoding · Dense Connections · Position-Wise Feed-Forward Layer · Residual Connection
