Empirical Evaluations of Active Learning Strategies in Legal Document Review
Rishi Chhatwal, Nathaniel Huber-Fliflet, Robert Keeling, Jianping, Zhang, Haozhen Zhao

TL;DR
This study empirically evaluates active learning strategies in legal document review, revealing that the most popular approach quickly identifies key documents but becomes less efficient over time, suggesting alternative strategies may be more effective.
Contribution
It provides real-world experimental insights into the effectiveness of active learning in legal document review, challenging assumptions about its superiority and proposing tailored strategies.
Findings
Popular active learning methods lose efficiency over time
Most effective initial strategies differ from ongoing review methods
Large, real-world legal datasets used for evaluation
Abstract
One type of machine learning, text classification, is now regularly applied in the legal matters involving voluminous document populations because it can reduce the time and expense associated with the review of those documents. One form of machine learning - Active Learning - has drawn attention from the legal community because it offers the potential to make the machine learning process even more effective. Active Learning, applied to legal documents, is considered a new technology in the legal domain and is continuously applied to all documents in a legal matter until an insignificant number of relevant documents are left for review. This implementation is slightly different than traditional implementations of Active Learning where the process stops once achieving acceptable model performance. The purpose of this paper is twofold: (i) to question whether Active Learning actually is a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
