TL;DR
This study compares different linguistic features to improve automatic age-based classification of fiction texts, aiding recommendations, filtering, and content suitability assessments.
Contribution
It provides an empirical evaluation of various feature types, highlighting the effectiveness of document-level features for age classification.
Findings
Document-level features significantly improve classification accuracy
Lexical and grammatical features are particularly effective
Publishing attributes also contribute to model performance
Abstract
The ability to automatically determine the age audience of a novel provides many opportunities for the development of information retrieval tools. Firstly, developers of book recommendation systems and electronic libraries may be interested in filtering texts by the age of the most likely readers. Further, parents may want to select literature for children. Finally, it will be useful for writers and publishers to determine which features influence whether the texts are suitable for children. In this article, we compare the empirical effectiveness of various types of linguistic features for the task of age-based classification of fiction texts. For this purpose, we collected a text corpus of book previews labeled with one of two categories -- children's or adult. We evaluated the following types of features: readability indices, sentiment, lexical, grammatical and general features, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Residual Connection · Weight Decay · Attention Dropout · Linear Warmup With Linear Decay · WordPiece · Adam · Dropout · Softmax · Dense Connections
