Topic Modeling the H\`an di\u{a}n Ancient Classics
Colin Allen, Hongliang Luo, Jaimie Murdock, Jianghuai Pu and, Xiaohong Wang, Yanjie Zhai, Kun Zhao

TL;DR
This paper applies probabilistic topic modeling to a large corpus of ancient Chinese texts to facilitate new insights and interpretations in humanities research, addressing unique challenges of ancient Chinese language and content.
Contribution
It introduces a novel application of topic modeling to ancient Chinese classics, demonstrating how computational methods can enhance understanding of culturally significant texts.
Findings
Effective topic modeling of ancient Chinese texts demonstrated
Software aids discovery of themes and interpretations
Implications for computational humanities research
Abstract
Ancient Chinese texts present an area of enormous challenge and opportunity for humanities scholars interested in exploiting computational methods to assist in the development of new insights and interpretations of culturally significant materials. In this paper we describe a collaborative effort between Indiana University and Xi'an Jiaotong University to support exploration and interpretation of a digital corpus of over 18,000 ancient Chinese documents, which we refer to as the "Handian" ancient classics corpus (H\`an di\u{a}n g\u{u} j\'i, i.e, the "Han canon" or "Chinese classics"). It contains classics of ancient Chinese philosophy, documents of historical and biographical significance, and literary works. We begin by describing the Digital Humanities context of this joint project, and the advances in humanities computing that made this project feasible. We describe the corpus and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Advanced Text Analysis Techniques · Topic Modeling
