Finding top-k similar pairs of objects annotated with terms from an ontology
Arnab Bhattacharya, Abhishek Bhowmick, Ambuj K. Singh

TL;DR
This paper addresses the problem of efficiently finding the top-k pairs of objects with the most similar annotations from a tree-structured ontology, introducing novel algorithms and bounds for different distance measures.
Contribution
It proposes new algorithms and lower bounds for top-k object pair retrieval based on three distance measures, including a novel best-first search for average pairwise distance.
Findings
Algorithms are practical and scalable on real and synthetic data.
The earth mover's distance approach improves matching accuracy.
Efficient pruning significantly speeds up the search process.
Abstract
With the growing focus on semantic searches and interpretations, an increasing number of standardized vocabularies and ontologies are being designed and used to describe data. We investigate the querying of objects described by a tree-structured ontology. Specifically, we consider the case of finding the top-k best pairs of objects that have been annotated with terms from such an ontology when the object descriptions are available only at runtime. We consider three distance measures. The first one defines the object distance as the minimum pairwise distance between the sets of terms describing them, and the second one defines the distance as the average pairwise term distance. The third and most useful distance measure, earth mover's distance, finds the best way of matching the terms and computes the distance corresponding to this best matching. We develop lower bounds that can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Semantic Web and Ontologies · Advanced Database Systems and Queries
