Machine Learning to Predict Digital Frustration from Clickstream Data
Jibin Joseph

TL;DR
This paper develops machine learning models, including XGBoost and LSTM, to predict user frustration in e-commerce sessions from clickstream data, achieving high accuracy with early session data.
Contribution
It introduces a novel approach combining rule-based frustration labeling with deep learning and tree-based models for early prediction of user frustration.
Findings
XGBoost achieves 90% accuracy and 0.9579 ROC AUC.
LSTM achieves 91% accuracy and 0.9705 ROC AUC.
Early prediction is effective with only 20-30 interactions.
Abstract
Many businesses depend on their mobile apps and websites, so user frustration while trying to complete a task on these channels can cause lost sales and complaints. In this research, I use clickstream data from a real e-commerce site to predict whether a session is frustrated or not. Frustration is defined using certain rules based on rage bursts, back and forth navigation (U turns), cart churn, search struggle, and long wandering sessions, and applies these rules to 5.4 million raw clickstream events (304,881 sessions). From each session, I build tabular features and train standard classifier models. I also use the full event sequence to train a discriminative LSTM classifier. XGBoost reaches about 90% accuracy, ROC AUC of 0.9579, while the LSTM performs best with about 91% accuracy and a ROC AUC of 0.9705. Finally, the research shows that with only the first 20 to 30 interactions, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersonal Information Management and User Behavior · Spam and Phishing Detection · Mind wandering and attention
