Mining Hidden Populations through Attributed Search
Suhansanu Kumar, Heting Gao, Changyu Wang, Hari Sundaram, Kevin, Chen-Chuan Chang

TL;DR
This paper introduces a decision tree-based Thompson sampling method to efficiently identify hidden populations on social networks by exploiting attribute correlations, outperforming existing methods in online and offline experiments.
Contribution
The paper presents a novel hierarchical query strategy using decision trees and Thompson sampling to discover hidden populations more effectively than prior approaches.
Findings
Outperforms state-of-the-art samplers by 54% on Twitter.
Achieves 0.9-1.5× better performance in offline experiments.
Effectively discovers hidden populations with limited query budgets.
Abstract
Researchers often query online social platforms through their application programming interfaces (API) to find target populations such as people with mental illness~\cite{De-Choudhury2017} and jazz musicians~\cite{heckathorn2001finding}. Entities of such target population satisfy a property that is typically identified using an oracle (human or a pre-trained classifier). When the property of the target entities is not directly queryable via the API, we refer to the property as `hidden' and the population as a hidden population. Finding individuals who belong to these populations on social networks is hard because they are non-queryable, and the sampler has to explore from a combinatorial query space within a finite budget limit. By exploiting the correlation between queryable attributes and the population of interest and by hierarchically ordering the query space, we propose a Decision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Spam and Phishing Detection · Data Stream Mining Techniques
