FOMO: Topics versus documents in legal eDiscovery
Herbert Roitblat

TL;DR
This paper models the process of legal eDiscovery as identifying responsive information rather than documents, using probabilistic models to estimate the likelihood of missing relevant topics and analyzing real data sets to validate the approach.
Contribution
It introduces a simple probabilistic model for estimating the chance of omitting relevant information in eDiscovery, focusing on topics within documents rather than individual documents.
Findings
At least one example of each topic can be found with relatively few documents.
Non-random search order does not affect topic distribution.
The model helps estimate confidence in capturing all relevant information.
Abstract
In the United States, the parties to a lawsuit are required to search through their electronically stored information to find documents that are relevant to the specific case and produce them to their opposing party. Negotiations over the scope of these searches often reflect a fear that something will be missed (Fear of Missing Out: FOMO). A Recall level of 80%, for example, means that 20% of the relevant documents will be left unproduced. This paper makes the argument that eDiscovery is the process of identifying responsive information, not identifying documents. Documents are the carriers of the information; they are not the direct targets of the process. A given document may contain one or more topics or factoids and a factoid may appear in more than one document. The coupon collector's problem, Heaps law, and other analyses provide ways to model the problem of finding information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Judicial and Constitutional Studies · Natural Language Processing Techniques
