A Data-Oriented Model of Literary Language
Andreas van Cranenburgh, Rens Bod

TL;DR
This paper introduces a novel data-oriented model that predicts the literary quality of texts using lexical and syntactic features, achieving high accuracy in distinguishing levels of literaryness.
Contribution
It is the first model to differentiate degrees of literaryness in novels using a combination of mined syntactic fragments and hand-crafted features.
Findings
Explains 76.0% of variation in literary ratings.
Outperforms standard bigram baseline.
Utilizes rich syntactic features for literary assessment.
Abstract
We consider the task of predicting how literary a text is, with a gold standard from human ratings. Aside from a standard bigram baseline, we apply rich syntactic tree fragments, mined from the training set, and a series of hand-picked features. Our model is the first to distinguish degrees of highly and less literary novels using a variety of lexical and syntactic features, and explains 76.0 % of the variation in literary ratings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
