A Data-Oriented Model of Literary Language

Andreas van Cranenburgh; Rens Bod

arXiv:1701.03329·cs.CL·April 12, 2017·1 cites

A Data-Oriented Model of Literary Language

Andreas van Cranenburgh, Rens Bod

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel data-oriented model that predicts the literary quality of texts using lexical and syntactic features, achieving high accuracy in distinguishing levels of literaryness.

Contribution

It is the first model to differentiate degrees of literaryness in novels using a combination of mined syntactic fragments and hand-crafted features.

Findings

01

Explains 76.0% of variation in literary ratings.

02

Outperforms standard bigram baseline.

03

Utilizes rich syntactic features for literary assessment.

Abstract

We consider the task of predicting how literary a text is, with a gold standard from human ratings. Aside from a standard bigram baseline, we apply rich syntactic tree fragments, mined from the training set, and a series of hand-picked features. Our model is the first to distinguish degrees of highly and less literary novels using a variety of lexical and syntactic features, and explains 76.0 % of the variation in literary ratings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andreasvc/literariness
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques