Identifying Relevant Document Facets for Keyword-Based Search Queries

Lanbo Zhang

arXiv:1501.00744·cs.IR·January 6, 2015

Identifying Relevant Document Facets for Keyword-Based Search Queries

Lanbo Zhang

PDF

Open Access

TL;DR

This paper addresses the challenge of identifying relevant document facets for keyword queries in structured documents, proposing a machine learning method to improve search relevance in datasets like movies.

Contribution

It introduces a novel machine learning approach with feature engineering to detect relevant facet-value pairs in keyword-based search queries.

Findings

01

Effective identification of relevant facets improves search accuracy.

02

The approach outperforms baseline methods on the INEX movie dataset.

03

Features significantly contribute to the model's success.

Abstract

As structured documents with rich metadata (such as products, movies, etc.) become increasingly prevalent, searching those documents has become an important IR problem. Although advanced search interfaces are widely available, most users still prefer to use keyword-based queries to search those documents. Query keywords often imply some hidden restrictions on the desired documents, which can be represented as document facet-value pairs. To achieve high retrieval performance, it's important to be able to identify the relevant facet-value pairs hidden in a query. In this paper, we study the problem of identifying document facet-value pairs that are relevant to a keyword-based search query. We propose a machine learning approach and a set of useful features, and evaluate our approach using a movie data set from INEX.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Text and Document Classification Technologies · Advanced Image and Video Retrieval Techniques