Uncertainty for Active Learning on Graphs

Dominik Fuchsgruber; Tom Wollschl\"ager; Bertrand Charpentier; Antonio; Oroz; Stephan G\"unnemann

arXiv:2405.01462·cs.LG·February 28, 2025

Uncertainty for Active Learning on Graphs

Dominik Fuchsgruber, Tom Wollschl\"ager, Bertrand Charpentier, Antonio, Oroz, Stephan G\"unnemann

PDF

Open Access 3 Reviews

TL;DR

This paper conducts a comprehensive study of uncertainty sampling for node classification on graphs, introducing ground-truth Bayesian uncertainty estimates and demonstrating their effectiveness over existing methods.

Contribution

It is the first extensive analysis of uncertainty sampling on graphs, proposing ground-truth Bayesian uncertainty estimates and an approximate method that outperforms existing uncertainty estimators.

Findings

01

Uncertainty sampling performance gap identified on graphs.

02

Ground-truth Bayesian uncertainty estimates effectively guide active learning.

03

Proposed approximate method outperforms other uncertainty estimators on real datasets.

Abstract

Uncertainty Sampling is an Active Learning strategy that aims to improve the data efficiency of machine learning models by iteratively acquiring labels of data points with the highest uncertainty. While it has proven effective for independent data its applicability to graphs remains under-explored. We propose the first extensive study of Uncertainty Sampling for node classification: (1) We benchmark Uncertainty Sampling beyond predictive uncertainty and highlight a significant performance gap to other Active Learning strategies. (2) We develop ground-truth Bayesian uncertainty estimates in terms of the data generating process and prove their effectiveness in guiding Uncertainty Sampling toward optimal queries. We confirm our results on synthetic data and design an approximate approach that consistently outperforms other uncertainty estimators on real datasets. (3) Based on this…

Peer Reviews

Decision·ICML 2024 Poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

- This work is well guided with a series of inquiries, starting with open questions in literature review, conducted with empirical observation, theoretical investigation, ending with experimental confirmation. - The thorough empirical analysis and the original theoretical insights are of interest to the scientific community.

Weaknesses

- The theoretical investigation, which is a major contribution of the article, not only considers a specific model (which is perfectly acceptable), but also assumes the full knowledge of the parameters underlying the model. As in practice the model parameters are rarely known and have to be estimated from data, their estimation error will contribute to the reducible uncertainty. Therefore defining the reducible uncertainty while assuming the model parameters to be pre-known seems to be problemat

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

- This work provides an empirical study for US with node classification on graphs, highlighting both its efficacy and potential limitations. - An important finding is that the existing AL methods cannot outperform random sampling benchmarks.

Weaknesses

- The study primarily concentrates on a specific graph type, the CSBM, which might not fully represent the characteristics of all real-world graphs. - Novelty concern: undoubtedly, this work offers an extensive exploration of uncertainty-based Active Learning (AL) within the context of graphs. However, it does not introduce any novel methods for active learning in the graph domain.

Reviewer 03Rating 3· reject, not good enoughConfidence 4

Strengths

a. This paper studies the active learning problem with graph data from an interesting perspective--uncertainty sampling strategy and propose a new Bayesian uncertainty estimation. b. The authors provide both theoretical analysis and empirical results to show the effectiveness of the proposed estimation.

Weaknesses

a. Theoretical contributions in this paper appear to be somewhat limited. The proposed uncertainty estimation is based on the posterior probability given the ground-truth label of the unobserved nodes. However, the essence of active learning lies in addressing this problem without access to ground-truth information, which remains inadequately addressed. b. The experimental results provided are restricted to synthetic data, and the method's reliance on knowledge of the true data generation proce

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms