Topic Detection and Tracking with Time-Aware Document Embeddings
Hang Jiang, Doug Beeferman, Weiquan Mao, Deb Roy

TL;DR
This paper introduces a neural approach that combines temporal and textual information into unified document embeddings for improved event detection in Topic Detection and Tracking systems, demonstrating significant performance gains.
Contribution
The work presents a novel time-aware document embedding method fine-tuned with triplet loss, effectively integrating temporal and semantic data for TDT tasks.
Findings
Significant improvements over baselines on News2013 dataset.
Enhanced online TDT pipeline performance with the new embeddings.
Better handling of recurring events compared to previous systems.
Abstract
The time at which a message is communicated is a vital piece of metadata in many real-world natural language processing tasks such as Topic Detection and Tracking (TDT). TDT systems aim to cluster a corpus of news articles by event, and in that context, stories that describe the same event are likely to have been written at around the same time. Prior work on time modeling for TDT takes this into account, but does not well capture how time interacts with the semantic nature of the event. For example, stories about a tropical storm are likely to be written within a short time interval, while stories about a movie release may appear over weeks or months. In our work, we design a neural method that fuses temporal and textual information into a single representation of news documents for event detection. We fine-tune these time-aware document embeddings with a triplet loss architecture,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Complex Network Analysis Techniques · Topic Modeling
MethodsTriplet Loss
