Diversifying Top-K Results
Lu Qin, Jeffrey Xu Yu, Lijun Chang

TL;DR
This paper introduces a general framework and new algorithms for diversified top-k search, effectively reducing redundancy in search results by considering result similarity, and demonstrates high efficiency on large datasets.
Contribution
It proposes a flexible framework extending existing top-k solutions to diversified search, with three novel algorithms for optimal result selection.
Findings
div-cut algorithm finds optimal solutions in seconds for large k
Framework easily extends existing top-k methods to diversify results
Extensive experiments validate high efficiency and effectiveness
Abstract
Top-k query processing finds a list of k results that have largest scores w.r.t the user given query, with the assumption that all the k results are independent to each other. In practice, some of the top-k results returned can be very similar to each other. As a result some of the top-k results returned are redundant. In the literature, diversified top-k search has been studied to return k results that take both score and diversity into consideration. Most existing solutions on diversified top-k search assume that scores of all the search results are given, and some works solve the diversity problem on a specific problem and can hardly be extended to general cases. In this paper, we study the diversified top-k search problem. We define a general diversified top-k search problem that only considers the similarity of the search results themselves. We propose a framework, such that most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Image and Video Retrieval Techniques · Advanced Database Systems and Queries
