A Proposed Architecture for Continuous Web Monitoring Through Online   Crawling of Blogs

Mehdi Naghavi; Mohsen Sharifi

arXiv:1202.1837·cs.IR·February 10, 2012

A Proposed Architecture for Continuous Web Monitoring Through Online Crawling of Blogs

Mehdi Naghavi, Mohsen Sharifi

PDF

Open Access

TL;DR

This paper presents an architecture for continuous online crawling of blogs to monitor web content in real-time, aiding analysts in timely decision-making by focusing on relevant blog data through a weighted graph-based focused crawler.

Contribution

It introduces a novel architecture utilizing a focused crawler and weighted graph to efficiently monitor and analyze blogs in real-time, addressing the challenge of vast web data.

Findings

01

Effective continuous blog monitoring achieved

02

Weighted graph improves relevance of fetched data

03

Real-time analysis supports timely decision-making

Abstract

Getting informed of what is registered in the Web space on time, can greatly help the psychologists, marketers and political analysts to familiarize, analyse, make decision and act correctly based on the society`s different needs. The great volume of information in the Web space hinders us to continuously online investigate the whole space of the Web. Focusing on the considered blogs limits our working domain and makes the online crawling in the Web space possible. In this article, an architecture is offered which continuously online crawls the related blogs, using focused crawler, and investigates and analyses the obtained data. The online fetching is done based on the latest announcements of the ping server machines. A weighted graph is formed based on targeting the important key phrases, so that a focused crawler can do the fetching of the complete texts of the related Web pages,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis