Compressing Search with Language Models
Thomas Mulc, Jennifer L. Steele

TL;DR
This paper introduces a novel method to compress and analyze search query data using language models, enabling accurate estimation of real-world events like car sales and flu rates without user-defined filters.
Contribution
It presents SLaM Compression for low-dimensional, memory-efficient search data representation and CoSMo for estimating real-world events solely from search data.
Findings
High accuracy in estimating U.S. automobile sales
Effective flu rate prediction from search data
Memory-efficient search data summaries
Abstract
Millions of people turn to Google Search each day for information on things as diverse as new cars or flu symptoms. The terms that they enter contain valuable information on their daily intent and activities, but the information in these search terms has been difficult to fully leverage. User-defined categorical filters have been the most common way to shrink the dimensionality of search data to a tractable size for analysis and modeling. In this paper we present a new approach to reducing the dimensionality of search data while retaining much of the information in the individual terms without user-defined rules. Our contributions are two-fold: 1) we introduce SLaM Compression, a way to quantify search terms using pre-trained language models and create a representation of search data that has low dimensionality, is memory efficient, and effectively acts as a summary of search, and 2) we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
