A Comparative Study of Feature Types for Age-Based Text Classification

Anna Glazkova; Yury Egorov; Maksim Glazkov

arXiv:2009.11898·cs.CL·August 30, 2021

A Comparative Study of Feature Types for Age-Based Text Classification

Anna Glazkova, Yury Egorov, Maksim Glazkov

PDF

1 Repo

TL;DR

This study compares different linguistic features to improve automatic age-based classification of fiction texts, aiding recommendations, filtering, and content suitability assessments.

Contribution

It provides an empirical evaluation of various feature types, highlighting the effectiveness of document-level features for age classification.

Findings

01

Document-level features significantly improve classification accuracy

02

Lexical and grammatical features are particularly effective

03

Publishing attributes also contribute to model performance

Abstract

The ability to automatically determine the age audience of a novel provides many opportunities for the development of information retrieval tools. Firstly, developers of book recommendation systems and electronic libraries may be interested in filtering texts by the age of the most likely readers. Further, parents may want to select literature for children. Finally, it will be useful for writers and publishers to determine which features influence whether the texts are suitable for children. In this article, we compare the empirical effectiveness of various types of linguistic features for the task of age-based classification of fiction texts. For this purpose, we collected a text corpus of book previews labeled with one of two categories -- children's or adult. We evaluated the following types of features: readability indices, sentiment, lexical, grammatical and general features, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oldaandozerskaya/age_based_classification
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Residual Connection · Weight Decay · Attention Dropout · Linear Warmup With Linear Decay · WordPiece · Adam · Dropout · Softmax · Dense Connections