Empirical Evaluations of Preprocessing Parameters' Impact on Predictive Coding's Effectiveness
Rishi Chhatwal, Nathaniel Huber-Fliflet, Robert Keeling, Jianping, Zhang, Haozhen Zhao

TL;DR
This paper empirically evaluates how preprocessing parameters and algorithms influence the effectiveness of predictive coding in legal data review, highlighting the importance of parameter tuning for optimal results.
Contribution
It provides a systematic analysis of preprocessing parameters and algorithms' impact on predictive coding accuracy using multiple real-world datasets.
Findings
Preprocessing parameters significantly affect predictive coding accuracy.
Different algorithms yield varying effectiveness depending on data characteristics.
Optimal parameter settings improve model performance and efficiency.
Abstract
Predictive coding, once used in only a small fraction of legal and business matters, is now widely deployed to quickly cull through increasingly vast amounts of data and reduce the need for costly and inefficient human document review. Previously, the sole front-end input used to create a predictive model was the exemplar documents (training data) chosen by subject-matter experts. Many predictive coding tools require users to rely on static preprocessing parameters and a single machine learning algorithm to develop the predictive model. Little research has been published discussing the impact preprocessing parameters and learning algorithms have on the effectiveness of the technology. A deeper dive into the generation of a predictive model shows that the settings and algorithm can have a strong effect on the accuracy and efficacy of a predictive coding tool. Understanding how these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
