Neighborhood-Regularized Self-Training for Learning with Few Labels
Ran Xu, Yue Yu, Hejie Cui, Xuan Kan, Yanqiao Zhu, Joyce Ho, Chao, Zhang, Carl Yang

TL;DR
This paper introduces a neighborhood-regularized self-training method that improves learning with few labels by reducing pseudo label noise and stabilizing training, leading to better performance across multiple tasks.
Contribution
The authors propose a neighborhood-based sample selection and prediction aggregation approach to enhance self-training for semi-supervised learning with limited labels.
Findings
Outperforms strong self-training baselines with 1.83% and 2.51% gains.
Reduces pseudo label noise by 36.8%.
Saves 57.3% of training time.
Abstract
Training deep neural networks (DNNs) with limited supervision has been a popular research topic as it can significantly alleviate the annotation burden. Self-training has been successfully applied in semi-supervised learning tasks, but one drawback of self-training is that it is vulnerable to the label noise from incorrect pseudo labels. Inspired by the fact that samples with similar labels tend to share similar representations, we develop a neighborhood-based sample selection approach to tackle the issue of noisy pseudo labels. We further stabilize self-training via aggregating the predictions from different rounds during sample selection. Experiments on eight tasks show that our proposed method outperforms the strongest self-training baseline with 1.83% and 2.51% performance gain for text and graph datasets on average. Our further analysis demonstrates that our proposed data selection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Text and Document Classification Technologies
