How Many Folders Do You Really Need?
Mihajlo Grbovic, Guy Halawi, Zohar Karnin, Yoelle Maarek

TL;DR
This paper presents a large-scale email classification system that automatically distinguishes between personal and machine-generated emails, categorizing messages into latent categories with high accuracy, based on analysis of over 500 billion messages.
Contribution
It introduces a novel approach for automatic email categorization into latent categories without user-defined folders, leveraging machine learning on a Web-scale dataset.
Findings
Achieved near 90% precision and recall in email classification.
Latent categories explain 70% of email traffic and search queries.
Discovered 6 latent categories covering significant email traffic.
Abstract
Email classification is still a mostly manual task. Consequently, most Web mail users never define a single folder. Recently however, automatic classification offering the same categories to all users has started to appear in some Web mail clients, such as AOL or Gmail. We adopt this approach, rather than previous (unsuccessful) personalized approaches because of the change in the nature of consumer email traffic, which is now dominated by (non-spam) machine-generated email. We propose here a novel approach for (1) automatically distinguishing between personal and machine-generated email and (2) classifying messages into latent categories, without requiring users to have defined any folder. We report how we have discovered that a set of 6 "latent" categories (one for human- and the others for machine-generated messages) can explain a significant portion of email traffic. We describe in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
