Improving Term Frequency Normalization for Multi-topical Documents, and   Application to Language Modeling Approaches

Seung-Hoon Na; In-Su Kang; Jong-Hyeok Lee

arXiv:1502.02277·cs.IR·February 10, 2015

Improving Term Frequency Normalization for Multi-topical Documents, and Application to Language Modeling Approaches

Seung-Hoon Na, In-Su Kang, Jong-Hyeok Lee

PDF

TL;DR

This paper introduces a new term frequency normalization method that accounts for verbosity and multi-topicality in documents, improving language modeling and retrieval precision.

Contribution

It proposes a partially-axiomatic TF normalization approach that differentiates between verbosity and multi-topicality, enhancing language modeling techniques.

Findings

01

Significant increase in keyword query precision

02

Substantial improvement in MAP for verbose queries

03

Better handling of document length variations

Abstract

Term frequency normalization is a serious issue since lengths of documents are various. Generally, documents become long due to two different reasons - verbosity and multi-topicality. First, verbosity means that the same topic is repeatedly mentioned by terms related to the topic, so that term frequency is more increased than the well-summarized one. Second, multi-topicality indicates that a document has a broad discussion of multi-topics, rather than single topic. Although these document characteristics should be differently handled, all previous methods of term frequency normalization have ignored these differences and have used a simplified length-driven approach which decreases the term frequency by only the length of a document, causing an unreasonable penalization. To attack this problem, we propose a novel TF normalization method which is a type of partially-axiomatic approach.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.