Notes on Coalgebras in Stylometry

Jo\"el A. Doat

arXiv:2010.02733·cs.CL·August 10, 2021

Notes on Coalgebras in Stylometry

Jo\"el A. Doat

PDF

Open Access

TL;DR

This paper explores how coalgebras can formalize and measure the syntactic behavior of texts in stylometry, enabling quantitative comparison of texts through a probabilistic transition system framework.

Contribution

It introduces a coalgebraic approach to model text behavior and proposes a polynomial-time algorithm to approximate behavioral distances for text comparison.

Findings

01

Coalgebraic models effectively capture syntactic features.

02

Behavioral distance quantifies differences between texts.

03

Approximation algorithm is computationally efficient.

Abstract

The syntactic behaviour of texts can highly vary depending on their contexts (e.g. author, genre, etc.). From the standpoint of stylometry, it can be helpful to objectively measure this behaviour. In this paper, we discuss how coalgebras are used to formalise the notion of behaviour by embedding syntactic features of a given text into probabilistic transition systems. By introducing the behavioural distance, we are then able to quantitatively measure differences between points in these systems and thus, comparing features of different texts. Furthermore, the behavioural distance of points can be approximated by a polynomial-time algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Rough Sets and Fuzzy Logic