Empirical Comparisons of CNN with Other Learning Algorithms for Text Classification in Legal Document Review
Robert Keeling, Rishi Chhatwal, Nathaniel Huber-Fliflet, Jianping, Zhang, Fusheng Wei, Haozhen Zhao, Shi Ye, Han Qin

TL;DR
This study empirically compares CNN with traditional machine learning algorithms like Logistic Regression, SVM, and Random Forest on real-world legal document review data of varying lengths, assessing their performance in text classification.
Contribution
It provides a comprehensive comparison of CNN and other algorithms on diverse legal document datasets, highlighting no single best performer across all scenarios.
Findings
CNN performed well but was not universally superior.
Performance varied depending on dataset and training size.
No single algorithm outperformed others in all cases.
Abstract
Research has shown that Convolutional Neural Networks (CNN) can be effectively applied to text classification as part of a predictive coding protocol. That said, most research to date has been conducted on data sets with short documents that do not reflect the variety of documents in real world document reviews. Using data from four actual reviews with documents of varying lengths, we compared CNN with other popular machine learning algorithms for text classification, including Logistic Regression, Support Vector Machine, and Random Forest. For each data set, classification models were trained with different training sample sizes using different learning algorithms. These models were then evaluated using a large randomly sampled test set of documents, and the results were compared using precision and recall curves. Our study demonstrates that CNN performed well, but that there was no…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Artificial Intelligence in Law
MethodsTest · Logistic Regression
