An Efficient Active Learning Pipeline for Legal Text Classification
Sepideh Mamooler, R\'emi Lebret, St\'ephane Massonnet, Karl, Aberer

TL;DR
This paper introduces a novel active learning pipeline tailored for legal text classification, combining pre-training, knowledge distillation, and an efficient initial sample selection to reduce annotation costs and improve model stability.
Contribution
It presents a new pipeline that enhances active learning for legal NLP by leveraging unlabeled data, semantic embedding guidance, and an efficient initial sample selection strategy.
Findings
Outperforms standard AL strategies on legal benchmarks.
Achieves comparable results to fully-supervised models with less labeled data.
Reduces annotation costs significantly.
Abstract
Active Learning (AL) is a powerful tool for learning with less labeled data, in particular, for specialized domains, like legal documents, where unlabeled data is abundant, but the annotation requires domain expertise and is thus expensive. Recent works have shown the effectiveness of AL strategies for pre-trained language models. However, most AL strategies require a set of labeled samples to start with, which is expensive to acquire. In addition, pre-trained language models have been shown unstable during fine-tuning with small datasets, and their embeddings are not semantically meaningful. In this work, we propose a pipeline for effectively using active learning with pre-trained language models in the legal domain. To this end, we leverage the available unlabeled data in three phases. First, we continue pre-training the model to adapt it to the downstream task. Second, we use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOil and Gas Production Techniques · Topic Modeling · Natural Language Processing Techniques
MethodsKnowledge Distillation
