TL;DR
This paper introduces a deep learning approach using CNNs to classify emails as human or machine-generated, significantly improving accuracy over previous models and outperforming BERT in this task.
Contribution
The paper presents a novel multi-model CNN framework for email classification, combining content, sender, action, and salutation signals, and demonstrates its superior performance over existing models.
Findings
Full model improves adjusted-recall from 70.5% to 78.8%.
Full model increases precision from 94.7% to 96.0%.
Outperforms state-of-the-art BERT model.
Abstract
It is an essential product requirement of Yahoo Mail to distinguish between personal and machine-generated emails. The old production classifier in Yahoo Mail was based on a simple logistic regression model. That model was trained by aggregating features at the SMTP address level. We propose building deep learning models at the message level. We built and trained four individual CNN models: (1) a content model with subject and content as input; (2) a sender model with sender email address and name as input; (3) an action model by analyzing email recipients' action patterns and correspondingly generating target labels based on senders' opening/deleting behaviors; (4) a salutation model by utilizing senders' "explicit salutation" signal as positive labels. Next, we built a final full model after exploring different combinations of the above four models. Experimental results on editorial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Attention Dropout · Dense Connections · Linear Warmup With Linear Decay · Residual Connection · Layer Normalization · Weight Decay
