Language-based Examples in the Statistics Classroom
Roger Bilisoly

TL;DR
This paper explores the use of language-based examples in statistics education, leveraging web resources and statistical software to analyze wordplay, idiomatic pairs, and pangrams for teaching purposes.
Contribution
It introduces novel language-based examples for statistics teaching, including analysis of word patterns, idiomatic pair frequency, and pangram properties in literary texts.
Findings
Wordplay patterns can be analyzed using statistical tools.
Idiomatic pairs occur more frequently than chance, with significant p-values.
Pangram lengths in Dickens' work align with probabilistic models.
Abstract
Statistics pedagogy values using a variety of examples. Thanks to text resources on the Web, and since statistical packages have the ability to analyze string data, it is now easy to use language-based examples in a statistics class. Three such examples are discussed here. First, many types of wordplay (e.g., crosswords and hangman) involve finding words with letters that satisfy a certain pattern. Second, linguistics has shown that idiomatic pairs of words often appear together more frequently than chance. For example, in the Brown Corpus, this is true of the phrasal verb to throw up (p-value=7.92E-10.) Third, a pangram contains all the letters of the alphabet at least once. These are searched for in Charles Dickens' A Christmas Carol, and their lengths are compared to the expected value given by the unequal probability coupon collector's problem as well as simulations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistics Education and Methodologies
