Navigation Objects Extraction for Better Content Structure Understanding
Kui Zhao, Bangpeng Li, Zilun Peng, Jiajun Bu, Can Wang

TL;DR
This paper introduces a novel method for extracting both static and dynamic navigation objects from webpages, enhancing understanding of content structure especially in web 2.0 sites with personalized and dynamic elements.
Contribution
The paper presents a new extraction approach that combines hyperlink clustering and SVM classification to identify navigation objects, including dynamic and personalized lists.
Findings
Effective in extracting static and dynamic navigation objects
Validated on diverse real-world webpages
Improves content structure understanding in web 2.0 sites
Abstract
Existing works for extracting navigation objects from webpages focus on navigation menus, so as to reveal the information architecture of the site. However, web 2.0 sites such as social networks, e-commerce portals etc. are making the understanding of the content structure in a web site increasingly difficult. Dynamic and personalized elements such as top stories, recommended list in a webpage are vital to the understanding of the dynamic nature of web 2.0 sites. To better understand the content structure in web 2.0 sites, in this paper we propose a new extraction method for navigation objects in a webpage. Our method will extract not only the static navigation menus, but also the dynamic and personalized page-specific navigation lists. Since the navigation objects in a webpage naturally come in blocks, we first cluster hyperlinks into different blocks by exploiting spatial locations of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSupport Vector Machine
