Towards Intelligent Legal Document Analysis: CNN-Driven Classification of Case Law Texts
Moinul Hossain, Sourav Rabi Das, Zikrul Shariar Ayon, Sadia Afrin Promi, Ahnaf Atef Choudhury, Shakila Rahman, Jia Uddin

TL;DR
This paper introduces a lightweight CNN-based framework for legal document classification that outperforms larger models in accuracy and speed, demonstrating efficiency and scalability in legal AI applications.
Contribution
The work presents a novel CNN-driven classification system with lemmatisation and subword embeddings, achieving high accuracy and efficiency in legal document analysis.
Findings
Achieved 97.26% accuracy and 96.82% macro F1-score on legal texts.
Model operates with only 5.1 million parameters and 0.31 ms inference time per document.
Outperforms BERT and other baselines in accuracy and speed.
Abstract
Legal practitioners and judicial institutions face an ever-growing volume of case-law documents characterised by formalised language, lengthy sentence structures, and highly specialised terminology, making manual triage both time-consuming and error-prone. This work presents a lightweight yet high-accuracy framework for citation-treatment classification that pairs lemmatisation-based preprocessing with subword-aware FastText embeddings and a multi-kernel one-dimensional Convolutional Neural Network (CNN). Evaluated on a publicly available corpus of 25,000 annotated legal documents with a 75/25 training-test partition, the proposed system achieves 97.26% classification accuracy and a macro F1-score of 96.82%, surpassing established baselines including fine-tuned BERT, Long Short-Term Memory (LSTM) with FastText, CNN with random embeddings, and a Term Frequency-Inverse Document Frequency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
