News Deja Vu: Connecting Past and Present with Semantic Search
Brevin Franklin, Emily Silcock, Abhishek Arora, Tom Bryan, Melissa, Dell

TL;DR
News Deja Vu is a semantic search tool that uses transformer models and bi-encoders to find historically similar news articles to modern queries, helping social scientists explore parallels across time.
Contribution
It introduces a novel, scalable semantic search method using transformer-based bi-encoders with entity masking for historical news analysis.
Findings
Effective retrieval of historical news articles similar to modern queries.
Handles large-scale, noisy historical datasets with OCR errors.
Accessible for social scientists without deep learning expertise.
Abstract
Social scientists and the general public often analyze contemporary events by drawing parallels with the past, a process complicated by the vast, noisy, and unstructured nature of historical texts. For example, hundreds of millions of page scans from historical newspapers have been noisily transcribed. Traditional sparse methods for searching for relevant material in these vast corpora, e.g., with keywords, can be brittle given complex vocabularies and OCR noise. This study introduces News Deja Vu, a novel semantic search tool that leverages transformer large language models and a bi-encoder approach to identify historical news articles that are most similar to modern news queries. News Deja Vu first recognizes and masks entities, in order to focus on broader parallels rather than the specific named entities being discussed. Then, a contrastively trained, lightweight bi-encoder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
MethodsFocus
