How to Query An Oracle? Efficient Strategies to Label Data

Farshad Lahouti; Victoria Kostina; Babak Hassibi

arXiv:2110.02341·cs.LG·October 7, 2021

How to Query An Oracle? Efficient Strategies to Label Data

Farshad Lahouti, Victoria Kostina, Babak Hassibi

PDF

TL;DR

This paper introduces efficient $k$-ary query strategies for labeling datasets, significantly reducing the number of queries needed compared to traditional pairwise methods, with proven analytical and empirical performance gains.

Contribution

It proposes novel randomized and greedy $k$-ary query algorithms that improve labeling efficiency and provides analytical and empirical validation of their performance.

Findings

01

Query rate of $O(N/k^2)$ with randomized algorithm

02

Average of 0.2N queries per sample with greedy scheme

03

Triplet queries are at most 50 ext{%} more time-consuming than pairwise queries

Abstract

We consider the basic problem of querying an expert oracle for labeling a dataset in machine learning. This is typically an expensive and time consuming process and therefore, we seek ways to do so efficiently. The conventional approach involves comparing each sample with (the representative of) each class to find a match. In a setting with $N$ equally likely classes, this involves $N /2$ pairwise comparisons (queries per sample) on average. We consider a $k$ -ary query scheme with $k \geq 2$ samples in a query that identifies (dis)similar items in the set while effectively exploiting the associated transitive relations. We present a randomized batch algorithm that operates on a round-by-round basis to label the samples and achieves a query rate of $O (\frac{N}{k ^{2}})$ . In addition, we present an adaptive greedy query scheme, which achieves an average rate of $\approx 0.2 N$ queries per sample…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.