Estimating the Total Volume of Queries to a Search Engine
Fabrizio Lillo, Salvatore Ruggieri

TL;DR
This paper develops statistical methods to estimate the total number and volume of search queries in a domain, using Zipf's law and empirical data, with applications to Italian cooking queries.
Contribution
It introduces novel estimators for total query volume based on power-law models and applies them to real search data, improving query volume estimation accuracy.
Findings
Estimators effectively predict total query volume.
Methods perform well on both continuous and binned data.
Application to Italian cooking queries demonstrates practical utility.
Abstract
We study the problem of estimating the total number of searches (volume) of queries in a specific domain, which were submitted to a search engine in a given time period. Our statistical model assumes that the distribution of searches follows a Zipf's law, and that the observed sample volumes are biased accordingly to three possible scenarios. These assumptions are consistent with empirical data, with keyword research practices, and with approximate algorithms used to take counts of query frequencies. A few estimators of the parameters of the distribution are devised and experimented, based on the nature of the empirical/simulated data. For continuous data, we recommend using nonlinear least square regression (NLS) on the top-volume queries, where the bound on the volume is obtained from the well-known Clauset, Shalizi and Newman (CSN) estimation of power-law parameters. For binned data,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Complex Network Analysis Techniques · Web Data Mining and Analysis
