# Word embeddings for idiolect identification

**Authors:** Konstantinos Perifanos, Eirini Florou, Dionysis Goutsos

arXiv: 1902.03658 · 2019-02-12

## TL;DR

This paper investigates how social media user embeddings, derived from neural language models and matrix factorization, can serve as stylistic fingerprints for individual idiolect identification, advancing authorship attribution methods.

## Contribution

It compares neural probabilistic language models and matrix factorization techniques for learning stylistic embeddings of social media users.

## Key findings

- Embeddings reflect individual writing styles effectively.
- Neural models and matrix factorization show comparable performance.
- Stylistic embeddings improve authorship attribution accuracy.

## Abstract

The term idiolect refers to the unique and distinctive use of language of an individual and it is the theoretical foundation of Authorship Attribution. In this paper we are focusing on learning distributed representations (embeddings) of social media users that reflect their writing style. These representations can be considered as stylistic fingerprints of the authors. We are exploring the performance of the two main flavours of distributed representations, namely embeddings produced by Neural Probabilistic Language models (such as word2vec) and matrix factorization (such as GloVe).

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.03658/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1902.03658/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1902.03658/full.md

---
Source: https://tomesphere.com/paper/1902.03658