Exploration of Proximity Heuristics in Length Normalization

Pranav Agrawal

arXiv:1701.01417·cs.IR·January 6, 2017

Exploration of Proximity Heuristics in Length Normalization

Pranav Agrawal

PDF

Open Access

TL;DR

This paper investigates proximity heuristics in length normalization for ranking functions, demonstrating that a proximity-based ranking function can outperform BM25 by 52% on unstructured text data.

Contribution

It introduces a generalized ranking function with application-dependent features and provides guidelines for feature engineering in information retrieval.

Findings

01

Proximity heuristic improves ranking performance significantly.

02

The proposed function outperforms BM25 by 52%.

03

Guidelines for feature construction in ranking functions.

Abstract

Ranking functions used in information retrieval are primarily used in the search engines and they are often adopted for various language processing applications. However, features used in the construction of ranking functions should be analyzed before applying it on a data set. This paper gives guidelines on construction of generalized ranking functions with application-dependent features. The paper prescribes a specific case of a generalized function for recommendation system using feature engineering guidelines on the given data set. The behavior of both generalized and specific functions are studied and implemented on the unstructured textual data. The proximity feature based ranking function has outperformed by 52% from regular BM25.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Text and Document Classification Technologies · Recommender Systems and Techniques