Is there something I'm missing? Topic Modeling in eDiscovery

Herbert L. Roitblat

arXiv:2007.15731·cs.IR·August 3, 2020·1 cites

Is there something I'm missing? Topic Modeling in eDiscovery

Herbert L. Roitblat

PDF

Open Access

TL;DR

This study demonstrates that in legal eDiscovery, even with incomplete document retrieval, all relevant topics can still be identified, supporting the idea that search can be both efficient and complete in terms of topics.

Contribution

The paper shows that partial document retrieval in eDiscovery still captures all relevant topics, using topic modeling and machine learning classifiers, challenging the need for 100% recall.

Findings

01

Less than full document recall still captures all topics.

02

Naive Bayes and SVM classifiers both find all topics in the hit set.

03

Topic coverage remains complete despite missing relevant documents.

Abstract

In legal eDiscovery, the parties are required to search through their electronically stored information to find documents that are relevant to a specific case. Negotiations over the scope of these searches are often based on a fear that something will be missed. This paper continues an argument that discovery should be based on identifying the facts of a case. If a search process is less than complete (if it has Recall less than 100%), it may still be complete in presenting all of the relevant available topics. In this study, Latent Dirichlet Allocation was used to identify 100 topics from all of the known relevant documents. The documents were then categorized to about 80% Recall (i.e., 80% of the relevant documents were found by the categorizer, designated the hit set and 20% were missed, designated the missed set). Despite the fact that less than all of the relevant documents were…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems · Advanced Graph Neural Networks