Predicting Central Topics in a Blog Corpus from a Networks Perspective
Srayan Datta

TL;DR
This paper introduces a method combining probabilistic topic modeling and network centrality measures to identify key topics in a large blog corpus, addressing an important gap in blogosphere content analysis.
Contribution
It proposes a novel approach that integrates probabilistic topic modeling with network analysis to discover central themes in blogs.
Findings
Successfully identified central topics in a large blog dataset
Demonstrated the effectiveness of combining topic modeling with network measures
Provides a new framework for analyzing social and political content in blogs
Abstract
In today's content-centric Internet, blogs are becoming increasingly popular and important from a data analysis perspective. According to Wikipedia, there were over 156 million public blogs on the Internet as of February 2011. Blogs are a reflection of our contemporary society. The contents of different blog posts are important from social, psychological, economical and political perspectives. Discovery of important topics in the blogosphere is an area which still needs much exploring. We try to come up with a procedure using probabilistic topic modeling and network centrality measures which identifies the central topics in a blog corpus.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Complex Network Analysis Techniques · Web Data Mining and Analysis
