The Clustering of Author's Texts of English Fiction in the Vector Space of Semantic Fields
Bohdan Pavlyshenko

TL;DR
This paper demonstrates that using a semantic fields vector space model effectively clusters English fiction texts by author, revealing individual author's ideolects and reducing dimensionality with SVD.
Contribution
It introduces a semantic fields basis for text clustering and shows its effectiveness in identifying author-specific semantic patterns in fiction texts.
Findings
Semantic fields basis improves clustering accuracy
SVD reduces semantic space dimensionality
Distinct author ideolects are identifiable in clusters
Abstract
The clustering of text documents in the vector space of semantic fields and in the semantic space with orthogonal basis has been analysed. It is shown that using the vector space model with the basis of semantic fields is effective in the cluster analysis algorithms of author's texts in English fiction. The analysis of the author's texts distribution in cluster structure showed the presence of the areas of semantic space that represent the author's ideolects of individual authors. SVD factorization of the semantic fields matrix makes it possible to reduce significantly the dimension of the semantic space in the cluster analysis of author's texts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
