Exploiting the Randomness of Large Language Models (LLM) in Text Classification Tasks: Locating Privileged Documents in Legal Matters

Keith Huffman; Jianping Zhang; Nathaniel Huber-Fliflet; Fusheng Wei; Peter Gronvall

arXiv:2512.08083·cs.IR·December 10, 2025

Exploiting the Randomness of Large Language Models (LLM) in Text Classification Tasks: Locating Privileged Documents in Legal Matters

Keith Huffman, Jianping Zhang, Nathaniel Huber-Fliflet, Fusheng Wei, Peter Gronvall

PDF

Open Access

TL;DR

This study empirically examines how randomness in large language models affects legal document classification, proposing a methodology to leverage randomness for improved accuracy and confidence in privileged document detection.

Contribution

The paper introduces a novel methodology that leverages randomness control in LLMs to enhance classification accuracy and reliability in legal document filtering tasks.

Findings

01

LLMs effectively identify privileged documents

02

Randomness control has minimal impact on performance

03

Leveraging randomness improves classification accuracy

Abstract

In legal matters, text classification models are most often used to filter through large datasets in search of documents that meet certain pre-selected criteria like relevance to a certain subject matter, such as legally privileged communications and attorney-directed documents. In this context, large language models have demonstrated strong performance. This paper presents an empirical study investigating the role of randomness in LLM-based classification for attorney-client privileged document detection, focusing on four key dimensions: (1) the effectiveness of LLMs in identifying legally privileged documents, (2) the influence of randomness control parameters on classification outputs, (3) their impact on overall classification performance, and (4) a methodology for leveraging randomness to enhance accuracy. Experimental results showed that LLMs can identify privileged documents…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Authorship Attribution and Profiling · Text and Document Classification Technologies