Improving Semi-Supervised Support Vector Machines Through Unlabeled   Instances Selection

Yu-Feng Li; Zhi-Hua Zhou

arXiv:1005.1545·cs.LG·May 10, 2011·1 cites

Improving Semi-Supervised Support Vector Machines Through Unlabeled Instances Selection

Yu-Feng Li, Zhi-Hua Zhou

PDF

Open Access

TL;DR

This paper proposes a method to improve semi-supervised support vector machines by selectively choosing unlabeled data through hierarchical clustering, reducing performance risks and enhancing generalization.

Contribution

It introduces S3VM-us, a novel approach that selectively exploits unlabeled instances to prevent performance degradation in semi-supervised SVMs.

Findings

01

S3VM-us significantly reduces performance degeneration.

02

Experiments show improved generalization over existing S3VMs.

03

Method effective across diverse datasets and settings.

Abstract

Semi-supervised support vector machines (S3VMs) are a kind of popular approaches which try to improve learning performance by exploiting unlabeled data. Though S3VMs have been found helpful in many situations, they may degenerate performance and the resultant generalization ability may be even worse than using the labeled data only. In this paper, we try to reduce the chance of performance degeneration of S3VMs. Our basic idea is that, rather than exploiting all unlabeled data, the unlabeled instances should be selected such that only the ones which are very likely to be helpful are exploited, while some highly risky unlabeled instances are avoided. We propose the S3VM-\emph{us} method by using hierarchical clustering to select the unlabeled instances. Experiments on a broad range of data sets over eighty-eight different settings show that the chance of performance degeneration of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Face and Expression Recognition · Text and Document Classification Technologies