An Evaluation of Link Neighborhood Lexical Signatures to Rediscover Missing Web Pages
Jeb Ware, Martin Klein, Michael L. Nelson

TL;DR
This paper presents a method to rediscover missing web pages by constructing lexical signatures from their backlink pages, using only ten backlinks and a four-word signature, successfully locating over half of the pages tested.
Contribution
It introduces a novel approach to generate lexical signatures from backlink neighborhoods to recover missing web pages, avoiding reliance on cached or archived data.
Findings
Using only ten backlinks, a four-word signature can be constructed.
The method successfully rediscovered over 50% of missing pages.
Only first-level backlinks are effective for this approach.
Abstract
For discovering the new URI of a missing web page, lexical signatures, which consist of a small number of words chosen to represent the "aboutness" of a page, have been previously proposed. However, prior methods relied on computing the lexical signature before the page was lost, or using cached or archived versions of the page to calculate a lexical signature. We demonstrate a system of constructing a lexical signature for a page from its link neighborhood, that is the "backlinks", or pages that link to the missing page. After testing various methods, we show that one can construct a lexical signature for a missing web page using only ten backlink pages. Further, we show that only the first level of backlinks are useful in this effort. The text that the backlinks use to point to the missing page is used as input for the creation of a four-word lexical signature. That lexical signature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Web visibility and informetrics · Advanced Text Analysis Techniques
