No Argument Left Behind: Overlapping Chunks for Faster Processing of   Arbitrarily Long Legal Texts

Israel Fama; B\'arbara Bueno; Alexandre Alcoforado; Thomas Palmeira; Ferraz; Arnold Moya; Anna Helena Reali Costa

arXiv:2410.19184·cs.CL·December 17, 2024

No Argument Left Behind: Overlapping Chunks for Faster Processing of Arbitrarily Long Legal Texts

Israel Fama, B\'arbara Bueno, Alexandre Alcoforado, Thomas Palmeira, Ferraz, Arnold Moya, Anna Helena Reali Costa

PDF

Open Access

TL;DR

This paper presents uBERT, a hybrid Transformer-RNN model designed to efficiently analyze arbitrarily long legal texts, addressing the slow processing issues in large-scale judiciary systems.

Contribution

The paper introduces uBERT, a novel hybrid model that combines Transformer and RNN architectures to handle long legal texts more efficiently than existing models.

Findings

01

uBERT outperforms BERT+LSTM with overlapping input.

02

uBERT is significantly faster than ULMFiT for long documents.

03

The approach maintains reasonable computational overhead.

Abstract

In a context where the Brazilian judiciary system, the largest in the world, faces a crisis due to the slow processing of millions of cases, it becomes imperative to develop efficient methods for analyzing legal texts. We introduce uBERT, a hybrid model that combines Transformer and Recurrent Neural Network architectures to effectively handle long legal texts. Our approach processes the full text regardless of its length while maintaining reasonable computational overhead. Our experiments demonstrate that uBERT achieves superior performance compared to BERT+LSTM when overlapping input is used and is significantly faster than ULMFiT for processing long legal documents.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Comparative and International Law Studies · Legal Education and Practice Innovations

MethodsAttention Is All You Need · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Temporal Activation Regularization · Weight Tying · Slanted Triangular Learning Rates · Dense Connections · Label Smoothing · Byte Pair Encoding