The Clustering of Author's Texts of English Fiction in the Vector Space   of Semantic Fields

Bohdan Pavlyshenko

arXiv:1212.1478·cs.CL·December 10, 2012

The Clustering of Author's Texts of English Fiction in the Vector Space of Semantic Fields

Bohdan Pavlyshenko

PDF

TL;DR

This paper demonstrates that using a semantic fields vector space model effectively clusters English fiction texts by author, revealing individual author's ideolects and reducing dimensionality with SVD.

Contribution

It introduces a semantic fields basis for text clustering and shows its effectiveness in identifying author-specific semantic patterns in fiction texts.

Findings

01

Semantic fields basis improves clustering accuracy

02

SVD reduces semantic space dimensionality

03

Distinct author ideolects are identifiable in clusters

Abstract

The clustering of text documents in the vector space of semantic fields and in the semantic space with orthogonal basis has been analysed. It is shown that using the vector space model with the basis of semantic fields is effective in the cluster analysis algorithms of author's texts in English fiction. The analysis of the author's texts distribution in cluster structure showed the presence of the areas of semantic space that represent the author's ideolects of individual authors. SVD factorization of the semantic fields matrix makes it possible to reduce significantly the dimension of the semantic space in the cluster analysis of author's texts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.