AcrosticSleuth: Probabilistic Identification and Ranking of Acrostics in   Multilingual Corpora

Aleksandr Fedchin; Isabel Cooperman; Pramit Chaudhuri; Joseph P.; Dexter

arXiv:2408.04427·cs.CL·August 9, 2024

AcrosticSleuth: Probabilistic Identification and Ranking of Acrostics in Multilingual Corpora

Aleksandr Fedchin, Isabel Cooperman, Pramit Chaudhuri, Joseph P., Dexter

PDF

Open Access 2 Repos 1 Video

TL;DR

AcrosticSleuth is a novel probabilistic tool that automatically detects and ranks acrostics in multilingual texts, enabling systematic study of hidden messages with high accuracy despite their rarity.

Contribution

It introduces AcrosticSleuth, the first probabilistic method for identifying acrostics, and provides a new dataset for evaluation across multiple languages.

Findings

01

Achieves F1 scores up to 0.66 on multilingual datasets.

02

Successfully identifies historically significant acrostics.

03

Demonstrates effectiveness in uncovering previously unknown acrostics.

Abstract

For centuries, writers have hidden messages in their texts as acrostics, where initial letters of consecutive lines or paragraphs form meaningful words or phrases. Scholars searching for acrostics manually can only focus on a few authors at a time and often favor qualitative arguments in discussing intentionally. We aim to put the study of acrostics on firmer statistical footing by presenting AcrosticSleuth, a first-of-its-kind tool that automatically identifies acrostics and ranks them by the probability that the sequence of characters does not occur by chance (and therefore may have been inserted intentionally). Acrostics are rare, so we formalize the problem as a binary classification task in the presence of extreme class imbalance. To evaluate AcrosticSleuth, we present the Acrostic Identification Dataset (AcrostID), a collection of acrostics from the WikiSource online database.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

AcrosticSleuth: Probabilistic Identification and Ranking of Acrostics in Multilingual Corpora· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsFocus