B(eo)W(u)LF: Facilitating recurrence analysis on multi-level language
A. Paxton, R. Dale

TL;DR
This paper introduces B(eo)W(u)LF, a new data format and tools in Python and MATLAB designed to facilitate recurrence analysis in multi-level discourse studies, demonstrated on Beowulf.
Contribution
The paper presents a novel data format and associated tools for recurrence analysis in discourse, enabling detailed multi-level linguistic investigations.
Findings
Developed B(eo)W(u)LF data format for discourse analysis.
Created Python and MATLAB tools for recurrence analysis.
Demonstrated methods on Beowulf translation.
Abstract
Discourse analysis may seek to characterize not only the overall composition of a given text but also the dynamic patterns within the data. This technical report introduces a data format intended to facilitate multi-level investigations, which we call the by-word long-form or B(eo)W(u)LF. Inspired by the long-form data format required for mixed-effects modeling, B(eo)W(u)LF structures linguistic data into an expanded matrix encoding any number of researchers-specified markers, making it ideal for recurrence-based analyses. While we do not necessarily claim to be the first to use methods along these lines, we have created a series of tools utilizing Python and MATLAB to enable such discourse analyses and demonstrate them using 319 lines of the Old English epic poem, Beowulf, translated into modern English.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Authorship Attribution and Profiling
