Exploiting the Randomness of Large Language Models (LLM) in Text Classification Tasks: Locating Privileged Documents in Legal Matters
Keith Huffman, Jianping Zhang, Nathaniel Huber-Fliflet, Fusheng Wei, Peter Gronvall

TL;DR
This study empirically examines how randomness in large language models affects legal document classification, proposing a methodology to leverage randomness for improved accuracy and confidence in privileged document detection.
Contribution
The paper introduces a novel methodology that leverages randomness control in LLMs to enhance classification accuracy and reliability in legal document filtering tasks.
Findings
LLMs effectively identify privileged documents
Randomness control has minimal impact on performance
Leveraging randomness improves classification accuracy
Abstract
In legal matters, text classification models are most often used to filter through large datasets in search of documents that meet certain pre-selected criteria like relevance to a certain subject matter, such as legally privileged communications and attorney-directed documents. In this context, large language models have demonstrated strong performance. This paper presents an empirical study investigating the role of randomness in LLM-based classification for attorney-client privileged document detection, focusing on four key dimensions: (1) the effectiveness of LLMs in identifying legally privileged documents, (2) the influence of randomness control parameters on classification outputs, (3) their impact on overall classification performance, and (4) a methodology for leveraging randomness to enhance accuracy. Experimental results showed that LLMs can identify privileged documents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Authorship Attribution and Profiling · Text and Document Classification Technologies
