Compressing Search with Language Models

Thomas Mulc; Jennifer L. Steele

arXiv:2407.00085·cs.IR·April 11, 2025

Compressing Search with Language Models

Thomas Mulc, Jennifer L. Steele

PDF

Open Access

TL;DR

This paper introduces a novel method to compress and analyze search query data using language models, enabling accurate estimation of real-world events like car sales and flu rates without user-defined filters.

Contribution

It presents SLaM Compression for low-dimensional, memory-efficient search data representation and CoSMo for estimating real-world events solely from search data.

Findings

01

High accuracy in estimating U.S. automobile sales

02

Effective flu rate prediction from search data

03

Memory-efficient search data summaries

Abstract

Millions of people turn to Google Search each day for information on things as diverse as new cars or flu symptoms. The terms that they enter contain valuable information on their daily intent and activities, but the information in these search terms has been difficult to fully leverage. User-defined categorical filters have been the most common way to shrink the dimensionality of search data to a tractable size for analysis and modeling. In this paper we present a new approach to reducing the dimensionality of search data while retaining much of the information in the individual terms without user-defined rules. Our contributions are two-fold: 1) we introduce SLaM Compression, a way to quantify search terms using pre-trained language models and create a representation of search data that has low dimensionality, is memory efficient, and effectively acts as a summary of search, and 2) we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling