Optimizing Storytelling, Improving Audience Retention, and Reducing Waste in the Entertainment Industry
Andrew Cornfeld, Ashley Miller, Mercedes Mora-Figueroa, Kurt Samuels, Anthony Palomba

TL;DR
This paper presents a machine learning framework that combines NLP features from TV episodes with traditional data to improve viewership forecasting, offering insights for better content decisions and audience retention.
Contribution
It introduces an integrated NLP and traditional data approach for TV viewership prediction, with a novel similarity scoring method for content comparison.
Findings
NLP features improve forecasting accuracy for some series.
Genre-specific performance observed across tested shows.
Provides interpretable metrics for industry decision-makers.
Abstract
Television networks face high financial risk when making programming decisions, often relying on limited historical data to forecast episodic viewership. This study introduces a machine learning framework that integrates natural language processing (NLP) features from over 25000 television episodes with traditional viewership data to enhance predictive accuracy. By extracting emotional tone, cognitive complexity, and narrative structure from episode dialogue, we evaluate forecasting performance using SARIMAX, rolling XGBoost, and feature selection models. While prior viewership remains a strong baseline predictor, NLP features contribute meaningful improvements for some series. We also introduce a similarity scoring method based on Euclidean distance between aggregate dialogue vectors to compare shows by content. Tested across diverse genres, including Better Call Saul and Abbott…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCultural Industries and Urban Development
MethodsFeature Selection
