Inference through innovation processes tested in the authorship attribution task
Giulio Tani Raffaelli, Margherita Lalli, Francesca Tria

TL;DR
This paper introduces a novel approach for authorship attribution using urn models with triggering, leveraging their connection to Bayesian non-parametric inference to improve accuracy, efficiency, and flexibility in analyzing symbolic sequences.
Contribution
It presents a general method for measuring similarity between sequences based on urn models, relaxing exchangeability assumptions and enhancing inference in complex, non-stationary systems.
Findings
High accuracy in authorship attribution tasks
Significant computational efficiency gains
Ability to handle non-stationary, correlated data
Abstract
Urn models for innovation capture fundamental empirical laws shared by several real-world processes. The so-called urn model with triggering includes, as particular cases, the urn representation of the two-parameter Poisson-Dirichlet process and the Dirichlet process, seminal in Bayesian non-parametric inference. In this work, we leverage this connection to introduce a general approach for quantifying closeness between symbolic sequences and test it within the framework of the authorship attribution problem. The method demonstrates high accuracy when compared to other related methods in different scenarios, featuring a substantial gain in computational efficiency and theoretical transparency. Beyond the practical convenience, this work demonstrates how the recently established connection between urn models and non-parametric Bayesian inference can pave the way for designing more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
