Exposing Query Identification for Search Transparency

Ruohan Li; Jianxiang Li; Bhaskar Mitra; Fernando Diaz; Asia J. Biega

arXiv:2110.07701·cs.IR·April 12, 2022·1 cites

Exposing Query Identification for Search Transparency

Ruohan Li, Jianxiang Li, Bhaskar Mitra, Fernando Diaz, Asia J. Biega

PDF

Open Access

TL;DR

This paper investigates approximate methods for identifying queries that expose specific content in search results, aiming to enhance transparency and address issues like bias and privacy.

Contribution

It introduces a novel retrieval-based approach for approximate exposing query identification using dual-encoder and BM25 models, with improvements via metric learning.

Findings

01

Approximate EQI is feasible with dense and traditional models.

02

Metric learning improves retrieval accuracy.

03

Empirical analysis highlights practical aspects of EQI.

Abstract

Search systems control the exposure of ranked content to searchers. In many cases, creators value not only the exposure of their content but, moreover, an understanding of the specific searches where the content is surfaced. The problem of identifying which queries expose a given piece of content in the ranking results is an important and relatively under-explored search transparency challenge. Exposing queries are useful for quantifying various issues of search bias, privacy, data protection, security, and search engine optimization. Exact identification of exposing queries in a given system is computationally expensive, especially in dynamic contexts such as web search. We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems: dense dual-encoder models and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Mobile Crowdsensing and Crowdsourcing