The Keyword Explorer Suite: A Toolkit for Understanding Online Populations
Philip Feldman, Shimei Pan, James R. Foulds

TL;DR
This paper introduces a Python toolkit that leverages large language models to identify, validate, and analyze social media data, enabling detailed insights into online populations through keyword generation, content analysis, and model fine-tuning.
Contribution
It presents a novel pipeline combining GPT-3 and GPT-2 for population analysis, including keyword generation, content validation, and latent information exploration.
Findings
Effective identification of relevant social media content
Enhanced understanding of population subgroups online
Open-source toolkit for population analysis
Abstract
We have developed a set of Python applications that use large language models to identify and analyze data from social media platforms relevant to a population of interest. Our pipeline begins with using OpenAI's GPT-3 to generate potential keywords for identifying relevant text content from the target population. The keywords are then validated, and the content downloaded and analyzed using GPT-3 embedding and manifold reduction. Corpora are then created to fine-tune GPT-2 models to explore latent information via prompt-based queries. These tools allow researchers and practitioners to gain valuable insights into population subgroups online. Source code at https://github.com/pgfeldman/KeywordExplorer
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Advanced Text Analysis Techniques · Topic Modeling
