Neural Active Learning Beyond Bandits

Yikun Ban; Ishika Agarwal; Ziwei Wu; Yada Zhu; Kommy Weldemariam,; Hanghang Tong; Jingrui He

arXiv:2404.12522·cs.LG·April 22, 2024·1 cites

Neural Active Learning Beyond Bandits

Yikun Ban, Ishika Agarwal, Ziwei Wu, Yada Zhu, Kommy Weldemariam,, Hanghang Tong, Jingrui He

PDF

Open Access 3 Reviews

TL;DR

This paper introduces neural network-based algorithms for active learning that reduce the negative impact of the number of classes on performance and computational costs, with theoretical guarantees and superior experimental results.

Contribution

Proposes two novel neural network algorithms for active learning that mitigate class number effects while maintaining exploration benefits and provable guarantees.

Findings

01

Algorithms outperform state-of-the-art baselines.

02

Theoretical guarantees show slower error growth with class number.

03

Consistent empirical improvements across experiments.

Abstract

We study both stream-based and pool-based active learning with neural network approximations. A recent line of works proposed bandit-based approaches that transformed active learning into a bandit problem, achieving both theoretical and empirical success. However, the performance and computational costs of these methods may be susceptible to the number of classes, denoted as $K$ , due to this transformation. Therefore, this paper seeks to answer the question: "How can we mitigate the adverse impacts of $K$ while retaining the advantages of principled exploration and provable performance guarantees in active learning?" To tackle this challenge, we propose two algorithms based on the newly designed exploitation and exploration neural networks for stream-based and pool-based active learning. Subsequently, we provide theoretical performance guarantees for both algorithms in a non-parametric…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 2

Strengths

The paper is both intuitive and theoretically sound. It introduces a new exploitation network and exploration network that take the original instance as input and simultaneously output the predicted probabilities for K classes. This approach eliminates the need to transform the instance into DK long vectors. The paper provides theoretical performance guarantees, showing a slower error-growth rate as K increases. Furthermore, it demonstrates that the proposed algorithms achieve the optimal active

Weaknesses

From a theoretical perspective, the core difference between existing methods that transform instances into DK long vectors is not clear.

Reviewer 02Rating 8· accept, good paperConfidence 3

Strengths

The theoretical bounds for the proposed method are quite an improvement over those for the method presented in Wang et al. 2021. The empirical wins, both on increased test set accuracy and decreased running time, for the stream and pool-based settings also demonstrate siginificant improvement over multiple prior approaches.

Weaknesses

One weakness of the proposed method is that it is demonstrated with a particular neural network structure (an MLP) and the authors do not state whether it can be used with arbitrary network structures, and if it can, then how the bounds might change (I realize this would be very difficult analysis). But some discussion around how general the approach is would make this paper much less niche/narrow. A related weakness is that the bounds are in terms of quantities (S and L_H) that are difficult f

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

# Origionality - The idea of exploration-exploitation networks is not so novel but related works are covered in detail. It originates in [10] and is then adopted by [11], which is one of the works this paper is compared to. - The idea of reducing the dimension to $d$ is novel considering the literature starting from [10]. # Quality - The theoretical proofs seem to be concrete. - The experiments are extensive and detailed. They validate the performance and computation efficiency claimed. # Clar

Weaknesses

- The main originality seems to be theoretical proofs rather than network structures. However, the proofs heavily depend on NTK techniques. - The experiments were conducted only five times, which could potentially impact the reliability of the results. I would appreciate seeing outcomes derived from a greater number of runs. - The theoretical results are only for the setting where the Tsybakov noise $\alpha = 0$, making it less comparable to existing literature, except when compared to [48] in

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces