Datasets for Portuguese Legal Semantic Textual Similarity: Comparing weak supervision and an annotation process approaches
Daniel da Silva Junior, Paulo Roberto dos S. Corval, Aline Paes and, Daniel de Oliveira

TL;DR
This paper introduces four datasets for Portuguese legal textual similarity, comparing weak supervision heuristic labels with expert annotations, to aid AI applications in legal document analysis.
Contribution
It provides new legal datasets with heuristic and expert labels, and evaluates the effectiveness of heuristic labeling for semantic similarity tasks.
Findings
Heuristic labels are useful for legal text similarity.
Expert annotations reveal challenges in semantic analysis.
Datasets facilitate AI development in legal domain.
Abstract
The Brazilian judiciary has a large workload, resulting in a long time to finish legal proceedings. Brazilian National Council of Justice has established in Resolution 469/2022 formal guidance for document and process digitalization opening up the possibility of using automatic techniques to help with everyday tasks in the legal field, particularly in a large number of texts yielded on the routine of law procedures. Notably, Artificial Intelligence (AI) techniques allow for processing and extracting useful information from textual data, potentially speeding up the process. However, datasets from the legal domain required by several AI techniques are scarce and difficult to obtain as they need labels from experts. To address this challenge, this article contributes with four datasets from the legal domain, two with documents and metadata but unlabeled, and another two labeled with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Topic Modeling · Natural Language Processing Techniques
