Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis
Adham Beykikhoshk, Ognjen Arandjelovic, Dinh Phung, Svetha Venkatesh

TL;DR
This paper presents a novel method to overcome Twitter data scarcity by using tweet content analysis to bootstrap richer data collection from linked web pages, enabling better tracking of complex topic evolution.
Contribution
It introduces a targeted knowledge exploration approach that leverages URLs in tweets and hierarchical topic modeling to analyze complex topic dynamics beyond limited tweet content.
Findings
Effective extraction of web content linked in tweets.
Improved tracking of autism-related topic evolution.
More meaningful insights than hashtag-based analysis.
Abstract
Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Topic Modeling · Complex Network Analysis Techniques
