Scalable Probabilistic Similarity Ranking in Uncertain Databases (Technical Report)
Thomas Bernecker, Hans-Peter Kriegel, Nikos Mamoulis, Matthias Renz, and Andreas Zuefle

TL;DR
This paper presents a scalable, linear-time framework for probabilistic top-k similarity ranking in uncertain vector data, significantly improving efficiency over previous quadratic approaches.
Contribution
It introduces an incremental, linear-time algorithm for probabilistic ranking that maintains accuracy and reduces computational complexity.
Findings
Achieves linear-time complexity for probabilistic ranking
Demonstrates efficiency on synthetic and real datasets
Maintains same memory requirements as previous methods
Abstract
This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that are assumed to be mutually-exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying a dynamic programming approach of quadratic complexity. In this paper we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Time Series Analysis and Forecasting
