History by Diversity: Helping Historians search News Archives
Jaspreet Singh, Wolfgang Nejdl, Avishek Anand

TL;DR
This paper introduces HistDiv, a novel algorithm that models historical search intent to diversify news archive results across aspects and important time periods, improving recall and user satisfaction.
Contribution
It defines the concept of Historical Query Intent and develops HistDiv, a new diversification algorithm tailored for temporal and aspect-based news archive searches.
Findings
HistDiv outperforms competitors in subtopic recall.
Users prefer HistDiv's ranking despite slight precision loss.
Temporal and aspect diversification enhances historical search effectiveness.
Abstract
Longitudinal corpora like newspaper archives are of immense value to historical research, and time as an important factor for historians strongly influences their search behaviour in these archives. While searching for articles published over time, a key preference is to retrieve documents which cover the important aspects from important points in time which is different from standard search behavior. To support this search strategy, we introduce the notion of a Historical Query Intent to explicitly model a historian's search task and define an aspect-time diversification problem over news archives. We present a novel algorithm, HistDiv, that explicitly models the aspects and important time windows based on a historian's information seeking behavior. By incorporating temporal priors based on publication times and temporal expressions, we diversify both on the aspect and temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Web Data Mining and Analysis · Data Management and Algorithms
