Analyzing Large Collections of Electronic Text Using OLAP
Steven Keith, Owen Kaser, Daniel Lemire

TL;DR
This paper presents a method for using OLAP systems to efficiently analyze large electronic text collections, enabling rapid, user-driven insights in humanities and social sciences research.
Contribution
It introduces a novel approach to storing and analyzing text data with OLAP, significantly reducing analysis time and supporting user-driven exploration.
Findings
Analysis time reduced from days to seconds
Supports user-driven research inquiries
Applicable to large text archives
Abstract
Computer-assisted reading and analysis of text has various applications in the humanities and social sciences. The increasing size of many electronic text archives has the advantage of a more complete analysis but the disadvantage of taking longer to obtain results. On-Line Analytical Processing is a method used to store and quickly analyze multidimensional data. By storing text analysis information in an OLAP system, a user can obtain solutions to inquiries in a matter of seconds as opposed to minutes, hours, or even days. This analysis is user-driven allowing various users the freedom to pursue their own direction of research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Advanced Database Systems and Queries · Web Data Mining and Analysis
