Pairwise Judgment Formulation for Semantic Embedding Model in Web Search
Mengze Hong, Di Jiang, Zichang Guo, Chen Jason Zhang

TL;DR
This paper analyzes strategies for creating effective pairwise judgment data for Semantic Embedding Models in web search, revealing that conventional methods may not be optimal and proposing a hybrid heuristic for improved training.
Contribution
It systematically evaluates pairwise judgment formulation strategies for SEMs, identifying a superior hybrid heuristic and providing best practices for SEM training.
Findings
Conventional Learning-to-Rank formulations are not optimal for SEM training.
A hybrid heuristic outperforms simpler atomic heuristics in SEM training.
The study offers practical guidelines for constructing SEM training data from query logs.
Abstract
Semantic Embedding Models (SEMs) have become a core component in information retrieval and natural language processing due to their ability to model semantic relevance. However, despite its growing applications in search engines, few studies have systematically explored how to construct effective training data for SEMs from large-scale search engine query logs. In this paper, we present a comprehensive analysis of strategies for generating pairwise judgments as SEM training data. An interesting (perhaps surprising) discovery reveals that conventional formulation approaches used in Learning-to-Rank (LTR) are not necessarily optimal for SEM training. Through a large-scale empirical study using query logs and click-through data from a major search engine, we identify effective strategies and demonstrate the advantages of a proposed hybrid heuristic over simpler atomic heuristics. Finally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Graph Neural Networks · Semantic Web and Ontologies
