Joint Span Segmentation and Rhetorical Role Labeling with Data Augmentation for Legal Documents
T.Y.S.S. Santosh, Philipp Bock, Matthias Grabmair

TL;DR
This paper introduces a span-level approach using semi-Markov CRFs for joint segmentation and rhetorical role labeling in legal documents, enhanced by data augmentation to address domain-specific data scarcity.
Contribution
It reformulates rhetorical role labeling as span classification, employing semi-Markov CRFs for joint learning, and explores data augmentation strategies to improve performance in legal texts.
Findings
Semi-Markov CRF outperforms baseline CRF in span prediction.
Multi-sentence spans improve model performance.
Data augmentation strategies enhance prediction metrics.
Abstract
Segmentation and Rhetorical Role Labeling of legal judgements play a crucial role in retrieval and adjacent tasks, including case summarization, semantic search, argument mining etc. Previous approaches have formulated this task either as independent classification or sequence labeling of sentences. In this work, we reformulate the task at span level as identifying spans of multiple consecutive sentences that share the same rhetorical role label to be assigned via classification. We employ semi-Markov Conditional Random Fields (CRF) to jointly learn span segmentation and span label assignment. We further explore three data augmentation strategies to mitigate the data scarcity in the specialized domain of law where individual documents tend to be very long and annotation cost is high. Our experiments demonstrate improvement of span-level prediction metrics with a semi-Markov CRF model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Natural Language Processing Techniques · Topic Modeling
MethodsConditional Random Field
