Estimating the Total Volume of Queries to a Search Engine

Fabrizio Lillo; Salvatore Ruggieri

arXiv:2101.09807·cs.IR·January 26, 2021

Estimating the Total Volume of Queries to a Search Engine

Fabrizio Lillo, Salvatore Ruggieri

PDF

Open Access 1 Repo

TL;DR

This paper develops statistical methods to estimate the total number and volume of search queries in a domain, using Zipf's law and empirical data, with applications to Italian cooking queries.

Contribution

It introduces novel estimators for total query volume based on power-law models and applies them to real search data, improving query volume estimation accuracy.

Findings

01

Estimators effectively predict total query volume.

02

Methods perform well on both continuous and binned data.

03

Application to Italian cooking queries demonstrates practical utility.

Abstract

We study the problem of estimating the total number of searches (volume) of queries in a specific domain, which were submitted to a search engine in a given time period. Our statistical model assumes that the distribution of searches follows a Zipf's law, and that the observed sample volumes are biased accordingly to three possible scenarios. These assumptions are consistent with empirical data, with keyword research practices, and with approximate algorithms used to take counts of query frequencies. A few estimators of the parameters of the distribution are devised and experimented, based on the nature of the empirical/simulated data. For continuous data, we recommend using nonlinear least square regression (NLS) on the top-volume queries, where the bound on the volume is obtained from the well-known Clauset, Shalizi and Newman (CSN) estimation of power-law parameters. For binned data,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ruggieris/QVolume
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Complex Network Analysis Techniques · Web Data Mining and Analysis