TL;DR
This paper introduces a novel machine learning-based method using support vector machines to analyze authorship in classical Chinese literature, revealing stylistic divides in Dream of the Red Chamber and validating its effectiveness across other novels.
Contribution
The paper develops a new quantitative approach for authorship analysis employing support vector machines and relative frequency features, providing robust evidence for stylistic divides within a classic novel.
Findings
Chapters 1-80 and 81-120 of Dream of the Red Chamber likely have different authorship.
Chapter 67's authorship is uncertain and possibly different from Cao Xueqin.
The method finds no chrono-divides in other tested classical Chinese novels.
Abstract
Inspired by the authorship controversy of Dream of the Red Chamber and the application of machine learning in the study of literary stylometry, we develop a rigorous new method for the mathematical analysis of authorship by testing for a so-called chrono-divide in writing styles. Our method incorporates some of the latest advances in the study of authorship attribution, particularly techniques from support vector machines. By introducing the notion of relative frequency as a feature ranking metric our method proves to be highly effective and robust. Applying our method to the Cheng-Gao version of Dream of the Red Chamber has led to convincing if not irrefutable evidence that the first chapters and the last chapters of the book were written by two different authors. Furthermore, our analysis has unexpectedly provided strong support to the hypothesis that Chapter 67 was not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
