BiTimeBERT: Extending Pre-Trained Language Representations with Bi-Temporal Information
Jiexin Wang, Adam Jatowt, Masatoshi Yoshikawa, Yi Cai

TL;DR
BiTimeBERT is a new language model trained on temporal news data that incorporates two types of temporal signals, significantly improving performance on time-sensitive NLP tasks compared to standard models like BERT.
Contribution
This work introduces BiTimeBERT, a novel pre-training approach that leverages long-span temporal news collections and two new tasks to create time-aware language representations.
Findings
BiTimeBERT outperforms BERT on various time-sensitive NLP tasks.
Achieves 155% accuracy improvement on event time estimation.
Demonstrates the importance of temporal signals in language modeling.
Abstract
Time is an important aspect of documents and is used in a range of NLP and IR tasks. In this work, we investigate methods for incorporating temporal information during pre-training to further improve the performance on time-related tasks. Compared with common pre-trained language models like BERT which utilize synchronic document collections (e.g., BookCorpus and Wikipedia) as the training corpora, we use long-span temporal news article collection for building word representations. We introduce BiTimeBERT, a novel language representation model trained on a temporal collection of news articles via two new pre-training tasks, which harnesses two distinct temporal signals to construct time-aware language representations. The experimental results show that BiTimeBERT consistently outperforms BERT and other existing pre-trained models with substantial gains on different downstream NLP tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Residual Connection · Attention Dropout · WordPiece · Weight Decay · Adam · Softmax · Layer Normalization
