Building a language evolution tree based on word vector combination model
Zhu Gao, Yanhui Jiang, Junhui Gao

TL;DR
This paper presents a novel method for constructing a language evolution tree using word vector combination, hierarchical clustering, and similarity measures applied to literary corpora spanning several centuries.
Contribution
It introduces a new approach to model language evolution through combined word vectors and clustering, validated across diverse literary themes and parameters.
Findings
The language evolution tree correlates with historical timelines.
The method is stable across different themes and parameters.
It effectively captures language change over centuries.
Abstract
In this paper, we try to explore the evolution of language through case calculations. First, we chose the novels of eleven British writers from 1400 to 2005 and found the corresponding works; Then, we use the natural language processing tool to construct the corresponding eleven corpora, and calculate the respective word vectors of 100 high-frequency words in eleven corpora; Next, for each corpus, we concatenate the 100 word vectors from beginning to end into one; Finally, we use the similarity comparison and hierarchical clustering method to generate the relationship tree between the combined eleven word vectors. This tree represents the relationship between eleven corpora. We found that in the tree generated by clustering, the distance between the corpus and the year corresponding to the corpus are basically the same. This means that we have discovered a specific language evolution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
