Enhancing Cost Efficiency in Active Learning with Candidate Set Query
Yeho Gwon, Sehyun Hwang, Hoyoung Kim, Jungseul Ok, Suha Kwak

TL;DR
This paper proposes a cost-efficient active learning framework using candidate set queries and conformal prediction to reduce labeling costs significantly, demonstrated on multiple image datasets.
Contribution
Introduces a novel candidate set query method combined with conformal prediction for adaptive, low-cost active learning in image classification.
Findings
Reduces labeling cost by 48% on ImageNet64x64
Effective and scalable across multiple datasets
Improves efficiency of active learning process
Abstract
This paper introduces a cost-efficient active learning (AL) framework for classification, featuring a novel query design called candidate set query. Unlike traditional AL queries requiring the oracle to examine all possible classes, our method narrows down the set of candidate classes likely to include the ground-truth class, significantly reducing the search space and labeling cost. Moreover, we leverage conformal prediction to dynamically generate small yet reliable candidate sets, adapting to model enhancement over successive AL rounds. To this end, we introduce an acquisition function designed to prioritize data points that offer high information gain at lower cost. Empirical evaluations on CIFAR-10, CIFAR-100, and ImageNet64x64 demonstrate the effectiveness and scalability of our framework. Notably, it reduces labeling cost by 48% on ImageNet64x64. The project page can be found at…
Peer Reviews
Decision·Submitted to ICLR 2025
1. This paper introduces a novel approach called Candidate Set Query (CSQ), which effectively reduces labeling costs by narrowing down the candidate classes presented to annotators, thereby minimizing annotation time. 2. The proposed method leverages conformal prediction to dynamically produce accurate candidate labels based on a cost-efficient data acquisition function. This function prioritizes samples with high information gain, leading to greater efficiency and reduced labeling costs. 3. The
1. The rationale behind the cost-efficient acquisition function in Eq. (8) needs to be further explained. Additional motivation and explanation for this function are recommended. 2. As shown in Fig. 9a, the performance is sensitive to the hyperparameter d. Providing guidelines for setting this parameter to an appropriate range on different datasets would be beneficial. 3. In realistic scenarios, the samples with high uncertainty waiting to be annotated can be divided into two groups based on the
1. The content of the paper is well presented. 2. The paper studies the cost of AL query in a more realistic way and proposes a solution for reducing the cost by candidate set query. 3. The candidate set is formed by conformal prediction and the candidate labels are related to the expected information gain with cost considerations.
The proposed method still depends on the conformal prediction and the calibration set to determine the confidence level. It is a realistic solution however not guaranteed to be theoretically sound. The convergence can not be obtained in a proper label complexity analysis. Similarly, the labeling cost assumption in Theorem 3.1 is only a rough approximation.
1. The motivation for this paper is clear, and the paper proposes a novel framework of high significance 2. The paper presents a solid theoretical framework that is thoroughly explained and mostly straightforward to follow 3. The framework is benchmarked across 3 well-known datasets, empirically demonstrating the effectiveness of the method 4. Thorough ablations studies were conducted to highlight the significance of each component of the framework
1. The benchmarks are conducted on very similar datasets (CIFAR-10, CIFAR-100, and ImageNet64x64 are all image classification datasets), and also only compares against a small number of baseline AL methods. It is unclear if the results will generalize well across different datasets and domains, and if more advanced underlying AL acquisition methods are used 2. The paper does not consider the implication of real-world datasets, such as those containing label noise, imbalance classes etc might imp
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Educational Technology and Assessment · Intelligent Tutoring Systems and Adaptive Learning
