Semi-Supervised Graph Imbalanced Regression

Gang Liu; Tong Zhao; Eric Inae; Tengfei Luo; Meng Jiang

arXiv:2305.12087·cs.LG·May 23, 2023·1 cites

Semi-Supervised Graph Imbalanced Regression

Gang Liu, Tong Zhao, Eric Inae, Tengfei Luo, Meng Jiang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a semi-supervised framework for graph regression that addresses data imbalance by pseudo-labeling and data augmentation, significantly improving prediction accuracy for rare label values.

Contribution

It proposes a novel semi-supervised approach combining confidence-based pseudo-labeling and label-anchored mixup to balance training data in graph regression tasks.

Findings

01

Significant reduction in prediction error, especially for under-represented labels.

02

Effective balancing of training data improves model performance across seven regression tasks.

03

Framework outperforms existing methods in handling imbalanced graph datasets.

Abstract

Data imbalance is easily found in annotated data when the observations of certain continuous label values are difficult to collect for regression tasks. When they come to molecule and polymer property predictions, the annotated graph datasets are often small because labeling them requires expensive equipment and effort. To address the lack of examples of rare label values in graph regression tasks, we propose a semi-supervised framework to progressively balance training data and reduce model bias via self-training. The training data balance is achieved by (1) pseudo-labeling more graphs for under-represented labels with a novel regression confidence measurement and (2) augmenting graph examples in latent space for remaining rare labels after data balancing with pseudo-labels. The former is to identify quality examples from unlabeled data whose labels are confidently predicted and sample…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liugangcode/SGIR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Text and Document Classification Technologies · Machine Learning and Data Classification

MethodsMixup