Doris: A tool for interactive exploration of historic corpora (Extended Version)
Sreya Guha

TL;DR
Doris is an interactive tool that enhances exploration of large document corpora by integrating semantic features with information retrieval, aiding researchers in uncovering social phenomena insights.
Contribution
The paper introduces Doris, a novel interactive exploration tool that combines semantic analysis with retrieval techniques for social science corpora.
Findings
Effective semantic feature integration improves exploration.
Visualization aids in understanding social phenomena.
Application to US presidential speeches demonstrates utility.
Abstract
Insights into social phenomenon can be gleaned from trends and patterns in corpora of documents associated with that phenomenon. Recent years have witnessed the use of computational techniques, mostly based on keywords, to analyze large corpora for these purposes. In this paper, we extend these techniques to incorporate semantic features. We introduce Doris, an interactive exploration tool that combines semantic features with information retrieval techniques to enable exploration of document corpora corresponding to the social phenomenon. We discuss the semantic techniques and describe an implementation on a corpus of United States (US) presidential speeches. We illustrate, with examples, how the ability to combine syntactic and semantic features in a visualization helps researchers more easily gain insights into the underlying phenomenon.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Computational and Text Analysis Methods · Topic Modeling
