Legal Area Classification: A Comparative Study of Text Classifiers on Singapore Supreme Court Judgments
Jerrold Soh Tsin Howe, Lim How Khang, Ian Ernst Chai

TL;DR
This study compares various machine learning methods for classifying Singapore Supreme Court judgments into legal areas, highlighting the performance of NLP techniques on a novel, lengthy legal dataset.
Contribution
It introduces a new dataset of Singapore legal judgments and evaluates the effectiveness of different ML and NLP classifiers in legal text classification.
Findings
All models performed well with limited data
State-of-the-art NLP methods show promise but need further optimization for legal texts
Traditional models remain competitive in legal classification tasks
Abstract
This paper conducts a comparative study on the performance of various machine learning (``ML'') approaches for classifying judgments into legal areas. Using a novel dataset of 6,227 Singapore Supreme Court judgments, we investigate how state-of-the-art NLP methods compare against traditional statistical models when applied to a legal corpus that comprised few but lengthy documents. All approaches tested, including topic model, word embedding, and language model-based classifiers, performed well with as little as a few hundred judgments. However, more work needs to be done to optimize state-of-the-art methods for the legal domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Legal Education and Practice Innovations · Judicial and Constitutional Studies
