Navigating the Small World Web by Textual Cues
Filippo Menczer

TL;DR
This paper explores how Web crawlers can use textual cues to navigate efficiently to relevant pages, combining link structure with content features to improve focused web navigation.
Contribution
It introduces a method that leverages textual cues alongside link structure for decentralized Web navigation, bridging empirical and theoretical approaches.
Findings
Textual cues improve navigation efficiency.
Decentralized algorithms can effectively locate relevant pages.
Content-based topology enhances Web crawling performance.
Abstract
Can a Web crawler efficiently locate an unknown relevant page? While this question is receiving much empirical attention due to its considerable commercial value in the search engine community [Cho98,Chakrabarti99,Menczer00,Menczer01], theoretical efforts to bound the performance of focused navigation have only exploited the link structure of the Web graph, neglecting other features [Kleinberg01,Adamic01,Kim02]. Here I investigate the connection between linkage and a content-induced topology of Web pages, suggesting that efficient paths can be discovered by decentralized navigation algorithms based on textual cues.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Complex Network Analysis Techniques · Web visibility and informetrics
