Diversification on Big Data in Query Processing

Meifan Zhang; Hongzhi Wang; Jianzhong Li; Hong Gao

arXiv:1808.00986·cs.DB·August 6, 2018

Diversification on Big Data in Query Processing

Meifan Zhang, Hongzhi Wang, Jianzhong Li, Hong Gao

PDF

Open Access

TL;DR

This paper introduces a new diversification framework for big data query processing, enhancing result diversity efficiently with theoretical guarantees, and validates it through extensive experiments on real and synthetic data.

Contribution

It proposes a novel diversification framework with theoretical success guarantees and tailored algorithms for numerical and string data in big data contexts.

Findings

01

Framework achieves high effectiveness and efficiency

02

Algorithms perform well on real data

03

Scalability confirmed on synthetic data

Abstract

Recently, in the area of big data, some popular applications such as web search engines and recommendation systems, face the problem to diversify results during query processing. In this sense, it is both significant and essential to propose methods to deal with big data in order to increase the diversity of the result set. In this paper, we firstly define a set's diversity and an element's ability to improve the set's overall diversity. Based on these definitions, we propose a diversification framework which has good performance in terms of effectiveness and efficiency. Also, this framework has theoretical guarantee on probability of success. Secondly, we design implementation algorithms based on this framework for both numerical and string data. Thirdly, for numerical and string data respectively, we carry out extensive experiments on real data to verify the performance of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Recommender Systems and Techniques · Data Mining Algorithms and Applications