Attribution analysis of legal language as used by LLM

Richard K. Belew

arXiv:2501.17330·cs.LG·January 30, 2025

Attribution analysis of legal language as used by LLM

Richard K. Belew

PDF

Open Access

TL;DR

This study analyzes how legal language influences LLM performance using attribution techniques, revealing that tokenizer differences significantly affect model behavior and classification accuracy in legal tasks.

Contribution

It introduces an attribution-based approach to understand legal language processing in LLMs and highlights the impact of tokenization differences on model performance.

Findings

01

Tokenizer differences explain most performance variations.

02

Attribution techniques reveal model decision reasons.

03

Legal tokens can be identified through frequency and stop word analysis.

Abstract

Three publicly-available LLM specifically designed for legal tasks have been implemented and shown that classification accuracy can benefit from training over legal corpora, but why and how? Here we use two publicly-available legal datasets, a simpler binary classification task of ``overruling'' texts, and a more elaborate multiple choice task identifying ``holding'' judicial decisions. We report on experiments contrasting the legal LLM and a generic BERT model for comparison, against both datasets. We use integrated gradient attribution techniques to impute ``causes'' of variation in the models' perfomance, and characterize them in terms of the tokenizations each use. We find that while all models can correctly classify some test examples from the casehold task, other examples can only be identified by only one, model, and attribution can be used to highlight the reasons for this. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicslinguistics and terminology studies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Layer Normalization · Softmax · Linear Warmup With Linear Decay · Adam · Residual Connection · Dropout · Linear Layer · Dense Connections