AcrosticSleuth: Probabilistic Identification and Ranking of Acrostics in Multilingual Corpora
Aleksandr Fedchin, Isabel Cooperman, Pramit Chaudhuri, Joseph P., Dexter

TL;DR
AcrosticSleuth is a novel probabilistic tool that automatically detects and ranks acrostics in multilingual texts, enabling systematic study of hidden messages with high accuracy despite their rarity.
Contribution
It introduces AcrosticSleuth, the first probabilistic method for identifying acrostics, and provides a new dataset for evaluation across multiple languages.
Findings
Achieves F1 scores up to 0.66 on multilingual datasets.
Successfully identifies historically significant acrostics.
Demonstrates effectiveness in uncovering previously unknown acrostics.
Abstract
For centuries, writers have hidden messages in their texts as acrostics, where initial letters of consecutive lines or paragraphs form meaningful words or phrases. Scholars searching for acrostics manually can only focus on a few authors at a time and often favor qualitative arguments in discussing intentionally. We aim to put the study of acrostics on firmer statistical footing by presenting AcrosticSleuth, a first-of-its-kind tool that automatically identifies acrostics and ranks them by the probability that the sequence of characters does not occur by chance (and therefore may have been inserted intentionally). Acrostics are rare, so we formalize the problem as a binary classification task in the presence of extreme class imbalance. To evaluate AcrosticSleuth, we present the Acrostic Identification Dataset (AcrostID), a collection of acrostics from the WikiSource online database.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsFocus
