# Exploring Selective Retrieval-Augmentation for Long-Tail Legal Text Classification

**Authors:** Boheng Mao

arXiv: 2508.19997 · 2025-09-01

## TL;DR

This paper introduces Selective Retrieval-Augmentation (SRA), a method that improves long-tail legal text classification by augmenting low-frequency class samples without changing model architecture or using external data.

## Contribution

The paper proposes SRA, a novel approach that enhances rare class performance in legal NLP tasks by selectively augmenting training samples from the dataset itself.

## Key findings

- SRA improves micro-F1 and macro-F1 scores on legal datasets.
- SRA achieves consistent performance gains over baselines.
- No external data or model modifications are required.

## Abstract

Legal text classification is a fundamental NLP task in the legal domain. Benchmark datasets in this area often exhibit a long-tail label distribution, where many labels are underrepresented, leading to poor model performance on rare classes. This paper explores Selective Retrieval-Augmentation (SRA) as a proof-of-concept approach to this problem. SRA focuses on augmenting samples belonging to low-frequency labels in the training set, preventing the introduction of noise for well-represented classes, and requires no changes to the model architecture. Retrieval is performed only from the training data to ensure there is no potential information leakage, removing the need for external corpora simultaneously. SRA is tested on two legal text classification benchmark datasets with long-tail distributions: LEDGAR (single-label) and UNFAIR-ToS (multi-label). Results show that SRA achieves consistent gains in both micro-F1 and macro-F1 over LexGLUE baselines.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.19997/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/2508.19997/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/2508.19997/full.md

---
Source: https://tomesphere.com/paper/2508.19997