Corpus of Chinese Dynastic Histories: Gender Analysis over Two Millennia

Sergey Zinin; Yang Xu

arXiv:2005.08793·cs.CL·May 19, 2020·5 cites

Corpus of Chinese Dynastic Histories: Gender Analysis over Two Millennia

Sergey Zinin, Yang Xu

PDF

Open Access

TL;DR

This paper introduces an open-source corpus of Chinese dynastic histories spanning 2000 years, enabling computational analysis of historical language use, with a focus on gender-specific terms and semantic stability over time.

Contribution

It provides the first open-source, annotated corpus of Chinese dynastic histories and develops a methodology for analyzing gendered language and semantic change in Classical Chinese.

Findings

01

Male terms dominate historical gender references

02

Gender-specific terms show considerable stability over two millennia

03

Keyword analysis reveals meaningful semantic representations

Abstract

Chinese dynastic histories form a large continuous linguistic space of approximately 2000 years, from the 3rd century BCE to the 18th century CE. The histories are documented in Classical (Literary) Chinese in a corpus of over 20 million characters, suitable for the computational analysis of historical lexicon and semantic change. However, there is no freely available open-source corpus of these histories, making Classical Chinese low-resource. This project introduces a new open-source corpus of twenty-four dynastic histories covered by Creative Commons license. An original list of Classical Chinese gender-specific terms was developed as a case study for analyzing the historical linguistic use of male and female terms. The study demonstrates considerable stability in the usage of these terms, with dominance of male terms. Exploration of word meanings uses keyword analysis of focus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGender Studies in Language · Computational and Text Analysis Methods · Authorship Attribution and Profiling