Spam four ways: Making sense of text data

Nicholas J. Horton; Jie Chao; William Finzer; Phebe Palmer

arXiv:2202.07389·stat.OT·October 11, 2022

Spam four ways: Making sense of text data

Nicholas J. Horton, Jie Chao, William Finzer, Phebe Palmer

PDF

Open Access

TL;DR

This paper explores four innovative methods for teaching text data analysis in statistics education, focusing on spam detection using various technological tools and approaches.

Contribution

It introduces four distinct educational approaches for analyzing spam emails, integrating technology and coding to enhance understanding of text classification.

Findings

01

All approaches improve students' understanding of text classification.

02

Technology integration varies but collectively enhances decision-making skills.

03

The methods demonstrate practical applications of text analytics in education.

Abstract

The world is full of text data, yet text analytics has not traditionally played a large part in statistics education. We consider four different ways to provide students with opportunities to explore whether email messages are unwanted correspondence (spam). Text from subject lines are used to identify features that can be used in classification. The approaches include use of a Model Eliciting Activity, exploration with CODAP, modeling with a specially designed Shiny app, and coding more sophisticated analyses using R. The approaches vary in their use of technology and code but all share the common goal of using data to make better decisions and assessment of the accuracy of those decisions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistics Education and Methodologies