LegalRelectra: Mixed-domain Language Modeling for Long-range Legal Text   Comprehension

Wenyue Hua; Yuchen Zhang; Zhe Chen; Josie Li; and Melanie Weber

arXiv:2212.08204·cs.CL·December 19, 2022·5 cites

LegalRelectra: Mixed-domain Language Modeling for Long-range Legal Text Comprehension

Wenyue Hua, Yuchen Zhang, Zhe Chen, Josie Li, and Melanie Weber

PDF

Open Access

TL;DR

LegalRelectra is a novel mixed-domain language model designed for long-range comprehension of legal and medical texts, outperforming general models on complex, mixed-domain legal documents.

Contribution

It introduces a mixed-domain legal-medical language model based on Electra and Reformer, enhancing long-range comprehension in specialized legal texts.

Findings

01

Improves processing of mixed-domain legal and medical texts

02

Enhances long-range text comprehension with Reformer architecture

03

Outperforms general and single-domain models on legal tasks

Abstract

The application of Natural Language Processing (NLP) to specialized domains, such as the law, has recently received a surge of interest. As many legal services rely on processing and analyzing large collections of documents, automating such tasks with NLP tools emerges as a key challenge. Many popular language models, such as BERT or RoBERTa, are general-purpose models, which have limitations on processing specialized legal terminology and syntax. In addition, legal documents may contain specialized vocabulary from other domains, such as medical terminology in personal injury text. Here, we propose LegalRelectra, a legal-domain language model that is trained on mixed-domain legal and medical corpora. We show that our model improves over general-domain and single-domain medical and legal language models when processing mixed-domain (personal injury) text. Our training architecture…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Natural Language Processing Techniques · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · 1x1 Convolution · Convolution · Reversible Residual Block · Adafactor · Locality Sensitive Hashing Attention · Dropout · Linear Layer · Byte Pair Encoding