How to Query An Oracle? Efficient Strategies to Label Data
Farshad Lahouti, Victoria Kostina, Babak Hassibi

TL;DR
This paper introduces efficient $k$-ary query strategies for labeling datasets, significantly reducing the number of queries needed compared to traditional pairwise methods, with proven analytical and empirical performance gains.
Contribution
It proposes novel randomized and greedy $k$-ary query algorithms that improve labeling efficiency and provides analytical and empirical validation of their performance.
Findings
Query rate of $O(N/k^2)$ with randomized algorithm
Average of 0.2N queries per sample with greedy scheme
Triplet queries are at most 50 ext{%} more time-consuming than pairwise queries
Abstract
We consider the basic problem of querying an expert oracle for labeling a dataset in machine learning. This is typically an expensive and time consuming process and therefore, we seek ways to do so efficiently. The conventional approach involves comparing each sample with (the representative of) each class to find a match. In a setting with equally likely classes, this involves pairwise comparisons (queries per sample) on average. We consider a -ary query scheme with samples in a query that identifies (dis)similar items in the set while effectively exploiting the associated transitive relations. We present a randomized batch algorithm that operates on a round-by-round basis to label the samples and achieves a query rate of . In addition, we present an adaptive greedy query scheme, which achieves an average rate of queries per sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
