An Empirical Study of the Application of Machine Learning and Keyword Terms Methodologies to Privilege-Document Review Projects in Legal Matters
Peter Gronvall, Nathaniel Huber-Fliflet, Jianping Zhang, Robert, Keeling, Robert Neary, Haozhen Zhao

TL;DR
This paper compares keyword search and predictive modeling methods for identifying privileged documents in legal review, highlighting their respective strengths and weaknesses based on a real-world data study.
Contribution
It provides an empirical evaluation of the effectiveness of keyword searching versus predictive modeling in legal privilege document review.
Findings
Predictive modeling outperforms keyword search in identifying privileged documents.
Keyword searches tend to produce more over-inclusive results, increasing review costs.
Predictive models can reduce false positives and improve review efficiency.
Abstract
Protecting privileged communications and data from disclosure is paramount for legal teams. Unrestricted legal advice, such as attorney-client communications or litigation strategy. are vital to the legal process and are exempt from disclosure in litigations or regulatory events. To protect this information from being disclosed, companies and outside counsel must review vast amounts of documents to determine those that contain privileged material. This process is extremely costly and time consuming. As data volumes increase, legal counsel employ methods to reduce the number of documents requiring review while balancing the need to ensure the protection of privileged information. Keyword searching is relied upon as a method to target privileged information and reduce document review populations. Keyword searches are effective at casting a wide net but return over inclusive results --…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
