TL;DR
HistBERT is a pre-trained language model trained on historical corpus data, demonstrating improved performance in diachronic semantic change analysis compared to original BERT, highlighting the importance of temporal context in semantic analysis.
Contribution
This work introduces HistBERT, a BERT-based model trained on historical data, to enhance diachronic semantic analysis and compare its effectiveness with standard BERT.
Findings
HistBERT outperforms BERT in semantic shift detection.
Training on historical data improves diachronic semantic analysis.
Effectiveness depends on the temporal profile of input texts.
Abstract
Contextualized word embeddings have demonstrated state-of-the-art performance in various natural language processing tasks including those that concern historical semantic change. However, language models such as BERT was trained primarily on contemporary corpus data. To investigate whether training on historical corpus data improves diachronic semantic analysis, we present a pre-trained BERT-based language model, HistBERT, trained on the balanced Corpus of Historical American English. We examine the effectiveness of our approach by comparing the performance of the original BERT and that of HistBERT, and we report promising results in word similarity and semantic shift analysis. Our work suggests that the effectiveness of contextual embeddings in diachronic semantic analysis is dependent on the temporal profile of the input text and care should be taken in applying this methodology to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide) · Multi-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Adam · Attention Dropout · Residual Connection · Softmax · Linear Warmup With Linear Decay
