Learning language variations in news corpora through differential embeddings
Carlos Selmo, Julian F. Martinez, Mariano G. Beir\'o, J. Ignacio, Alvarez-Hamelin

TL;DR
This paper introduces a star-like embedding model that captures language variations across time and regions by learning from multiple corpora simultaneously, demonstrated on newspapers from the US and UK.
Contribution
The proposed model enables simultaneous learning of language variations across different slices, capturing temporal and regional differences effectively.
Findings
Successfully modeled semantic drift over time within newspapers.
Captured regional language differences between US and UK English.
Provided extensive evaluation demonstrating model effectiveness.
Abstract
There is an increasing interest in the NLP community in capturing variations in the usage of language, either through time (i.e., semantic drift), across regions (as dialects or variants) or in different social contexts (i.e., professional or media technolects). Several successful dynamical embeddings have been proposed that can track semantic change through time. Here we show that a model with a central word representation and a slice-dependent contribution can learn word embeddings from different corpora simultaneously. This model is based on a star-like representation of the slices. We apply it to The New York Times and The Guardian newspapers, and we show that it can capture both temporal dynamics in the yearly slices of each corpus, and language variations between US and UK English in a curated multi-source corpus. We provide an extensive evaluation of this methodology.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Authorship Attribution and Profiling · Computational and Text Analysis Methods
