Neighborhood-Regularized Self-Training for Learning with Few Labels

Ran Xu; Yue Yu; Hejie Cui; Xuan Kan; Yanqiao Zhu; Joyce Ho; Chao; Zhang; Carl Yang

arXiv:2301.03726·cs.LG·February 17, 2023

Neighborhood-Regularized Self-Training for Learning with Few Labels

Ran Xu, Yue Yu, Hejie Cui, Xuan Kan, Yanqiao Zhu, Joyce Ho, Chao, Zhang, Carl Yang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a neighborhood-regularized self-training method that improves learning with few labels by reducing pseudo label noise and stabilizing training, leading to better performance across multiple tasks.

Contribution

The authors propose a neighborhood-based sample selection and prediction aggregation approach to enhance self-training for semi-supervised learning with limited labels.

Findings

01

Outperforms strong self-training baselines with 1.83% and 2.51% gains.

02

Reduces pseudo label noise by 36.8%.

03

Saves 57.3% of training time.

Abstract

Training deep neural networks (DNNs) with limited supervision has been a popular research topic as it can significantly alleviate the annotation burden. Self-training has been successfully applied in semi-supervised learning tasks, but one drawback of self-training is that it is vulnerable to the label noise from incorrect pseudo labels. Inspired by the fact that samples with similar labels tend to share similar representations, we develop a neighborhood-based sample selection approach to tackle the issue of noisy pseudo labels. We further stabilize self-training via aggregating the predictions from different rounds during sample selection. Experiments on eight tasks show that our proposed method outperforms the strongest self-training baseline with 1.83% and 2.51% performance gain for text and graph datasets on average. Our further analysis demonstrates that our proposed data selection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ritaranx/nest
pytorchOfficial

Videos

Neighborhood-Regularized Self-Training for Learning with Few Labels· underline

Taxonomy

TopicsMachine Learning and Data Classification · Text and Document Classification Technologies