Multi-Facet Blending for Faceted Query-by-Example Retrieval

Heejin Do; Sangwon Ryu; Jonghwi Kim; Gary Geunbae Lee

arXiv:2412.01443·cs.IR·December 3, 2024

Multi-Facet Blending for Faceted Query-by-Example Retrieval

Heejin Do, Sangwon Ryu, Jonghwi Kim, Gary Geunbae Lee

PDF

Open Access 1 Video

TL;DR

This paper introduces FaBle, a modular data augmentation method for faceted query-by-example retrieval, which synthesizes facet-specific training data using large language models, improving retrieval in new domains.

Contribution

FaBle is a novel augmentation technique that decomposes documents into facets and recomposes them to generate facet-aware training data without pre-labeled facets.

Findings

01

FaBle improves facet-conditioned embedding quality.

02

Augmentation enhances retrieval performance on a new educational dataset.

03

Method reduces reliance on domain-specific facet labels.

Abstract

With the growing demand to fit fine-grained user intents, faceted query-by-example (QBE), which retrieves similar documents conditioned on specific facets, has gained recent attention. However, prior approaches mainly depend on document-level comparisons using basic indicators like citations due to the lack of facet-level relevance datasets; yet, this limits their use to citation-based domains and fails to capture the intricacies of facet constraints. In this paper, we propose a multi-facet blending (FaBle) augmentation method, which exploits modularity by decomposing and recomposing to explicitly synthesize facet-specific training sets. We automatically decompose documents into facet units and generate (ir)relevant pairs by leveraging LLMs' intrinsic distinguishing capabilities; then, dynamically recomposing the units leads to facet-wise relevance-informed document pairs. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multi-Facet Blending for Faceted Query-by-Example Retrieval· underline

Taxonomy

TopicsWeb Data Mining and Analysis · Data Quality and Management · Data Management and Algorithms