Data scraping, ingestation, and modeling: bringing data from cars.com into the intro stats class
Sarah McDonald, Nicholas Jon Horton

TL;DR
This paper presents a classroom activity where students manually scrape car data from cars.com, ingest it into R, and analyze relationships between price, mileage, and model year to enhance statistical learning through hands-on experience.
Contribution
It introduces a practical, hands-on activity integrating data scraping, ingestion, and analysis into introductory statistics education, bridging theory and real-world data skills.
Findings
Students gain experience in data collection and analysis.
The activity demonstrates relationships between car attributes.
Enhances understanding of the statistical analysis cycle.
Abstract
New tools have made it much easier for students to develop skills to work with interesting data sets as they begin to extract meaning from data. To fully appreciate the statistical analysis cycle, students benefit from repeated experiences collecting, ingesting, wrangling, analyzing data and communicating results. How can we bring such opportunities into the classroom? We describe a classroom activity, originally developed by Danny Kaplan (Macalester College), in which students can expand upon statistical problem solving by hand-scraping data from cars.com, ingesting these data into R, then carrying out analyses of the relationships between price, mileage, and model year for a selected type of car.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
