Spam four ways: Making sense of text data
Nicholas J. Horton, Jie Chao, William Finzer, Phebe Palmer

TL;DR
This paper explores four innovative methods for teaching text data analysis in statistics education, focusing on spam detection using various technological tools and approaches.
Contribution
It introduces four distinct educational approaches for analyzing spam emails, integrating technology and coding to enhance understanding of text classification.
Findings
All approaches improve students' understanding of text classification.
Technology integration varies but collectively enhances decision-making skills.
The methods demonstrate practical applications of text analytics in education.
Abstract
The world is full of text data, yet text analytics has not traditionally played a large part in statistics education. We consider four different ways to provide students with opportunities to explore whether email messages are unwanted correspondence (spam). Text from subject lines are used to identify features that can be used in classification. The approaches include use of a Model Eliciting Activity, exploration with CODAP, modeling with a specially designed Shiny app, and coding more sophisticated analyses using R. The approaches vary in their use of technology and code but all share the common goal of using data to make better decisions and assessment of the accuracy of those decisions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistics Education and Methodologies
