Exposing Query Identification for Search Transparency
Ruohan Li, Jianxiang Li, Bhaskar Mitra, Fernando Diaz, Asia J. Biega

TL;DR
This paper investigates approximate methods for identifying queries that expose specific content in search results, aiming to enhance transparency and address issues like bias and privacy.
Contribution
It introduces a novel retrieval-based approach for approximate exposing query identification using dual-encoder and BM25 models, with improvements via metric learning.
Findings
Approximate EQI is feasible with dense and traditional models.
Metric learning improves retrieval accuracy.
Empirical analysis highlights practical aspects of EQI.
Abstract
Search systems control the exposure of ranked content to searchers. In many cases, creators value not only the exposure of their content but, moreover, an understanding of the specific searches where the content is surfaced. The problem of identifying which queries expose a given piece of content in the ranking results is an important and relatively under-explored search transparency challenge. Exposing queries are useful for quantifying various issues of search bias, privacy, data protection, security, and search engine optimization. Exact identification of exposing queries in a given system is computationally expensive, especially in dynamic contexts such as web search. We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems: dense dual-encoder models and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Mobile Crowdsensing and Crowdsourcing
