LEGAL-BERT: The Muppets straight out of Law School

Ilias Chalkidis; Manos Fergadiotis; Prodromos Malakasiotis; Nikolaos; Aletras; Ion Androutsopoulos

arXiv:2010.02559·cs.CL·October 7, 2020

LEGAL-BERT: The Muppets straight out of Law School

Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos, Aletras, Ion Androutsopoulos

PDF

3 Models

TL;DR

This paper investigates how to best adapt BERT for legal NLP tasks, revealing that standard guidelines often fail and proposing a systematic approach with new models and hyper-parameter tuning.

Contribution

It introduces a comprehensive evaluation of BERT adaptation strategies in the legal domain and releases LEGAL-BERT models tailored for legal NLP applications.

Findings

01

Standard pre-training guidelines do not always generalize well to legal NLP.

02

Adapting BERT with additional pre-training on legal corpora improves performance.

03

The paper releases a family of LEGAL-BERT models for legal NLP tasks.

Abstract

BERT has achieved impressive performance in several NLP tasks. However, there has been limited investigation on its adaptation guidelines in specialised domains. Here we focus on the legal domain, where we explore several approaches for applying BERT models to downstream legal tasks, evaluating on multiple datasets. Our findings indicate that the previous guidelines for pre-training and fine-tuning, often blindly followed, do not always generalize well in the legal domain. Thus we propose a systematic investigation of the available strategies when applying BERT in specialised domains. These are: (a) use the original BERT out of the box, (b) adapt BERT by additional pre-training on domain-specific corpora, and (c) pre-train BERT from scratch on domain-specific corpora. We also propose a broader hyper-parameter search space when fine-tuning for downstream tasks and we release LEGAL-BERT,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Dense Connections · Layer Normalization · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Attention Is All You Need