NewsHomepages: Homepage Layouts Capture Information Prioritization Decisions
Ben Welsh, Naitian Zhou, Arda Kaz, Michael Vu, Alexander Spangher

TL;DR
This paper introduces NewsHomepages, a large dataset of website homepages over three years, and models to infer news item importance, demonstrating applications in ranking local policies and understanding information prioritization.
Contribution
The work provides a new dataset of homepage layouts and develops models for inferring news significance, with broader implications for understanding organizational information prioritization.
Findings
Successfully created a dataset of 3,000+ homepages over three years
Developed models for pairwise comparison of news importance
Applied models to rank local policies by newsworthiness
Abstract
Information prioritization plays an important role in how humans perceive and understand the world. Homepage layouts serve as a tangible proxy for this prioritization. In this work, we present NewsHomepages, a large dataset of over 3,000 new website homepages (including local, national and topic-specific outlets) captured twice daily over a three-year period. We develop models to perform pairwise comparisons between news items to infer their relative significance. To illustrate that modeling organizational hierarchies has broader implications, we applied our models to rank-order a collection of local city council policies passed over a ten-year period in San Francisco, assessing their "newsworthiness". Our findings lay the groundwork for leveraging implicit organizational cues to deepen our understanding of information prioritization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Data Quality and Management
