Sublanguage: A Serious Issue Affects Pretrained Models in Legal Domain
Ha-Thanh Nguyen, Le-Minh Nguyen

TL;DR
Pretrained models often fail to account for legal sublanguage, risking misapplication; this paper introduces BERTLaw, a specialized pretrained model for legal language, demonstrating improved effectiveness.
Contribution
The paper presents BERTLaw, a pretrained model tailored to legal sublanguage, addressing a critical gap in applying general models to legal texts.
Findings
BERTLaw outperforms baseline pretrained models in legal NLP tasks.
Legal sublanguage significantly impacts model effectiveness.
Introducing domain-specific pretrained models improves legal text understanding.
Abstract
Legal English is a sublanguage that is important for everyone but not for everyone to understand. Pretrained models have become best practices among current deep learning approaches for different problems. It would be a waste or even a danger if these models were applied in practice without knowledge of the sublanguage of the law. In this paper, we raise the issue and propose a trivial solution by introducing BERTLaw a legal sublanguage pretrained model. The paper's experiments demonstrate the superior effectiveness of the method compared to the baseline pretrained model
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Artificial Intelligence in Law
