An NLP approach to quantify dynamic salience of predefined topics in a text corpus
A. Bock, A. Palladino, S. Smith-Heisters, I. Boardman, E. Pellegrini,, E.J. Bienenstock, A. Valenti

TL;DR
This paper introduces an NLP-based method to measure how the importance of predefined topics fluctuates over time in large text collections, aiding social trend analysis.
Contribution
It presents a novel approach to quantify dynamic topic salience by analyzing n-gram usage patterns that deviate from a baseline in large corpora.
Findings
Effective identification of topic-related n-grams with abnormal usage patterns
Ability to detect emergence or decline of topics over time
Provides a ground-up view of topic dynamics in text data
Abstract
The proliferation of news media available online simultaneously presents a valuable resource and significant challenge to analysts aiming to profile and understand social and cultural trends in a geographic location of interest. While an abundance of news reports documenting significant events, trends, and responses provides a more democratized picture of the social characteristics of a location, making sense of an entire corpus to extract significant trends is a steep challenge for any one analyst or team. Here, we present an approach using natural language processing techniques that seeks to quantify how a set of pre-defined topics of interest change over time across a large corpus of text. We found that, given a predefined topic, we can identify and rank sets of terms, or n-grams, that map to those topics and have usage patterns that deviate from a normal baseline. Emergence,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Computational and Text Analysis Methods · Advanced Text Analysis Techniques
