Further Theoretical Study of Distribution Separation Method for Information Retrieval
Peng Zhang, Qian Yu, Yuexian Hou, Dawei Song, Jingfei Li, Bin Hu

TL;DR
This paper provides a theoretical analysis of the Distribution Separation Method (DSM) for information retrieval, demonstrating its assumptions and relation to EM algorithms, supported by empirical results.
Contribution
It generalizes DSM's theoretical properties, linking its assumptions to KL-Divergence, and shows its connection to EM algorithms in mixture models.
Findings
DSM's minimum correlation assumption is equivalent to maximum KL-Divergence assumption.
The EM algorithm in mixture models can be viewed as a distribution separation process.
Empirical results support the theoretical analysis.
Abstract
Recently, a Distribution Separation Method (DSM) is proposed for relevant feedback in information retrieval, which aims to approximate the true relevance distribution by separating a seed irrelevance distribution from the mixture one. While DSM achieved a promising empirical performance, theoretical analysis of DSM is still need further study and comparison with other relative retrieval model. In this article, we first generalize DSM's theoretical property, by proving that its minimum correlation assumption is equivalent to the maximum (original and symmetrized) KL-Divergence assumption. Second, we also analytically show that the EM algorithm in a well-known Mixture Model is essentially a distribution separation process and can be simplified using the linear separation algorithm in DSM. Some empirical results are also presented to support our theoretical analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Information Retrieval and Search Behavior · Text and Document Classification Technologies
