Temporal Attention for Language Models
Guy D. Rosin, Kira Radinsky

TL;DR
This paper introduces a temporal attention mechanism for transformer-based language models, enabling them to incorporate time information and improve performance on semantic change detection across multiple languages.
Contribution
It proposes a novel time-aware self-attention mechanism that enhances language models with temporal context, improving their ability to detect semantic change.
Findings
Achieved state-of-the-art results on semantic change detection datasets
Successfully applied to BERT across three languages
Demonstrated effectiveness across diverse datasets in different languages
Abstract
Pretrained language models based on the transformer architecture have shown great success in NLP. Textual training data often comes from the web and is thus tagged with time-specific information, but most language models ignore this information. They are trained on the textual data alone, limiting their ability to generalize temporally. In this work, we extend the key component of the transformer architecture, i.e., the self-attention mechanism, and propose temporal attention - a time-aware self-attention mechanism. Temporal attention can be applied to any transformer model and requires the input texts to be accompanied with their relevant time points. It allows the transformer to capture this temporal information and create time-specific contextualized word representations. We leverage these representations for the task of semantic change detection; we apply our proposed mechanism to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Computational and Text Analysis Methods
MethodsLinear Layer · Dense Connections · Weight Decay · Residual Connection · Softmax · Linear Warmup With Linear Decay · Adam · Attention Dropout · Multi-Head Attention · Refunds@Expedia|||How do I get a full refund from Expedia?
