Legal Transformer Models May Not Always Help
Saibo Geng, R\'emi Lebret, Karl Aberer

TL;DR
This paper evaluates the effectiveness of domain adaptive pre-training and language adapters in legal NLP tasks, finding they are beneficial mainly for low-resource tasks and can reduce training costs, with the release of LegalRoBERTa.
Contribution
It provides a comprehensive benchmark of domain adaptive pre-training and adapters in legal NLP, highlighting their specific benefits and limitations.
Findings
Domain adaptive pre-training helps only low-resource tasks.
Adapters achieve similar performance to full tuning with less cost.
LegalRoBERTa is a new pre-trained legal language model.
Abstract
Deep learning-based Natural Language Processing methods, especially transformers, have achieved impressive performance in the last few years. Applying those state-of-the-art NLP methods to legal activities to automate or simplify some simple work is of great value. This work investigates the value of domain adaptive pre-training and language adapters in legal NLP tasks. By comparing the performance of language models with domain adaptive pre-training on different tasks and different dataset splits, we show that domain adaptive pre-training is only helpful with low-resource downstream tasks, thus far from being a panacea. We also benchmark the performance of adapters in a typical legal NLP task and show that they can yield similar performance to full model tuning with much smaller training costs. As an additional result, we release LegalRoBERTa, a RoBERTa model further pre-trained on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Law · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Dropout · Layer Normalization · Softmax · Residual Connection
