Comparing the hierarchy of keywords in on-line news portals
Gergely Tib\'ely, David Sousa-Rodrigues, P\'eter Pollner, Gergely, Palla

TL;DR
This paper investigates the latent hierarchical structure of keywords in online news portals using a co-occurrence-based method, revealing significant differences in topic importance and network structure across portals.
Contribution
It applies a novel hierarchy extraction method to online news keywords, uncovering discrepancies in topic importance and network structure among different portals.
Findings
Hierarchies differ significantly across news portals.
Important topics are positioned at the top of the hierarchies.
Underlying network structures vary notably between sources.
Abstract
The tagging of on-line content with informative keywords is a widespread phenomenon from scientific article repositories through blogs to on-line news portals. In most of the cases, the tags on a given item are free words chosen by the authors independently. Therefore, relations among keywords in a collection of news items is unknown. However, in most cases the topics and concepts described by these keywords are forming a latent hierarchy, with the more general topics and categories at the top, and more specialised ones at the bottom. Here we apply a recent, cooccurrence-based tag hierarchy extraction method to sets of keywords obtained from four different on-line news portals. The resulting hierarchies show substantial differences not just in the topics rendered as important (being at the top of the hierarchy) or of less interest (categorised low in the hierarchy), but also in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
