From Noise to Signal: When Outliers Seed New Topics
Evangelia Zve, Gauvain Bourgne, Benjamin Icard, Jean-Gabriel Ganascia

TL;DR
This paper demonstrates that certain outliers in dynamic topic modeling can serve as early indicators of emerging topics, linking weak signals with temporal evolution in news data.
Contribution
It introduces a temporal taxonomy of document trajectories that distinguishes anticipatory outliers from reinforcing or isolated documents, enhancing early detection of new topics.
Findings
High consensus on anticipatory outliers among models
Retrospective evaluation on French news corpus
Qualitative case studies illustrate trajectory types
Abstract
Outliers in dynamic topic modeling are typically treated as noise, yet we show that some can serve as early signals of emerging topics. We introduce a temporal taxonomy of news-document trajectories that defines how documents relate to topic formation over time. It distinguishes anticipatory outliers, which precede the topics they later join, from documents that either reinforce existing topics or remain isolated. By capturing these trajectories, the taxonomy links weak-signal detection with temporal topic modeling and clarifies how individual articles anticipate, initiate, or drift within evolving clusters. We implement it in a cumulative clustering setting using document embeddings from eleven state-of-the-art language models and evaluate it retrospectively on HydroNewsFr, a French news corpus on the hydrogen economy. Inter-model agreement reveals a small, high-consensus subset of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Sentiment Analysis and Opinion Mining
