Multiple regression techniques for modeling dates of first performances of Shakespeare-era plays
Pablo Moscato, Hugh Craig, Gabriel Egan, Mohammad Nazmul Haque, Kevin, Huang, Julia Sloan, Jon Corrales de Oliveira

TL;DR
This study evaluates multiple regression techniques, including a novel memetic algorithm-based CFR, to predict the first performance dates of Shakespeare-era plays using linguistic features, enhancing historical and stylistic understanding.
Contribution
Introduces a memetic algorithm-based CFR method for interpretable, low-dimensional modeling of play dates, improving prediction accuracy and linguistic analysis.
Findings
CFR achieved accurate date predictions with fewer variables.
Identified linguistic patterns correlating with play genres.
Demonstrated the importance of stylistic features in dating plays.
Abstract
The date of the first performance of a play of Shakespeare's time must usually be guessed with reference to multiple indirect external sources, or to some aspect of the content or style of the play. Identifying these dates is important to literary history and to accounts of developing authorial styles, such as Shakespeare's. In this study, we took a set of Shakespeare-era plays (181 plays from the period 1585--1610), added the best-guess dates for them from a standard reference work as metadata, and calculated a set of probabilities of individual words in these samples. We applied 11 regression methods to predict the dates of the plays at an 80/20 training/test split. We withdrew one play at a time, used the best-guess date metadata with the probabilities and weightings to infer its date, and thus built a model of date-probabilities interaction. We introduced a memetic algorithm-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
