Exploration of Proximity Heuristics in Length Normalization
Pranav Agrawal

TL;DR
This paper investigates proximity heuristics in length normalization for ranking functions, demonstrating that a proximity-based ranking function can outperform BM25 by 52% on unstructured text data.
Contribution
It introduces a generalized ranking function with application-dependent features and provides guidelines for feature engineering in information retrieval.
Findings
Proximity heuristic improves ranking performance significantly.
The proposed function outperforms BM25 by 52%.
Guidelines for feature construction in ranking functions.
Abstract
Ranking functions used in information retrieval are primarily used in the search engines and they are often adopted for various language processing applications. However, features used in the construction of ranking functions should be analyzed before applying it on a data set. This paper gives guidelines on construction of generalized ranking functions with application-dependent features. The paper prescribes a specific case of a generalized function for recommendation system using feature engineering guidelines on the given data set. The behavior of both generalized and specific functions are studied and implemented on the unstructured textual data. The proximity feature based ranking function has outperformed by 52% from regular BM25.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Text and Document Classification Technologies · Recommender Systems and Techniques
