Finding Favourite Tuples on Data Streams with Provably Few Comparisons
Guangyi Zhang, Nikolaj Tatti, Aristides Gionis

TL;DR
This paper introduces a streaming algorithm for efficiently identifying high-utility tuples in large data streams through minimal pairwise comparisons, accommodating ties, and demonstrating superior performance over existing methods.
Contribution
The paper presents a novel single-pass streaming algorithm with provably few comparisons for finding high-utility tuples, including a variant that handles ties, and enhances pruning techniques using mathematical programming.
Findings
Algorithm uses logarithmic comparisons in worst case
Handles ties to reduce human error
Outperforms existing methods in scalability and accuracy
Abstract
One of the most fundamental tasks in data science is to assist a user with unknown preferences in finding high-utility tuples within a large database. To accurately elicit the unknown user preferences, a widely-adopted way is by asking the user to compare pairs of tuples. In this paper, we study the problem of identifying one or more high-utility tuples by adaptively receiving user input on a minimum number of pairwise comparisons. We devise a single-pass streaming algorithm, which processes each tuple in the stream at most once, while ensuring that the memory size and the number of requested comparisons are in the worst case logarithmic in , where is the number of all tuples. An important variant of the problem, which can help to reduce human error in comparisons, is to allow users to declare ties when confronted with pairs of tuples of nearly equal utility. We show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Data Management and Algorithms · Advanced Database Systems and Queries
