Sublanguage: A Serious Issue Affects Pretrained Models in Legal Domain

Ha-Thanh Nguyen; Le-Minh Nguyen

arXiv:2104.07782·cs.CL·September 6, 2021·1 cites

Sublanguage: A Serious Issue Affects Pretrained Models in Legal Domain

Ha-Thanh Nguyen, Le-Minh Nguyen

PDF

Open Access

TL;DR

Pretrained models often fail to account for legal sublanguage, risking misapplication; this paper introduces BERTLaw, a specialized pretrained model for legal language, demonstrating improved effectiveness.

Contribution

The paper presents BERTLaw, a pretrained model tailored to legal sublanguage, addressing a critical gap in applying general models to legal texts.

Findings

01

BERTLaw outperforms baseline pretrained models in legal NLP tasks.

02

Legal sublanguage significantly impacts model effectiveness.

03

Introducing domain-specific pretrained models improves legal text understanding.

Abstract

Legal English is a sublanguage that is important for everyone but not for everyone to understand. Pretrained models have become best practices among current deep learning approaches for different problems. It would be a waste or even a danger if these models were applied in practice without knowledge of the sublanguage of the law. In this paper, we raise the issue and propose a trivial solution by introducing BERTLaw a legal sublanguage pretrained model. The paper's experiments demonstrate the superior effectiveness of the method compared to the baseline pretrained model

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Artificial Intelligence in Law