The Shape of Consumer Behavior: A Symbolic and Topological Analysis of Time Series
Pola Bereta, Ioannis Diamantis

TL;DR
This paper compares symbolic and topological clustering methods for analyzing noisy, high-dimensional Google Trends data to better understand consumer behavior patterns.
Contribution
It evaluates and contrasts SAX, eSAX, and TDA methods for clustering consumer interest time series, highlighting TDA's advantages in capturing complex structures.
Findings
TDA provides more meaningful clusters for volatile data.
SAX and eSAX are faster but less effective with noisy series.
Hybrid approaches show promise for future consumer analytics.
Abstract
Understanding temporal patterns in online search behavior is crucial for real-time marketing and trend forecasting. Google Trends offers a rich proxy for public interest, yet the high dimensionality and noise of its time-series data present challenges for effective clustering. This study evaluates three unsupervised clustering approaches, Symbolic Aggregate approXimation (SAX), enhanced SAX (eSAX), and Topological Data Analysis (TDA), applied to 20 Google Trends keywords representing major consumer categories. Our results show that while SAX and eSAX offer fast and interpretable clustering for stable time series, they struggle with volatility and complexity, often producing ambiguous ``catch-all'' clusters. TDA, by contrast, captures global structural features through persistent homology and achieves more balanced and meaningful groupings. We conclude with practical guidance for using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques
