More Than Meets The Eye: Semi-supervised Learning Under Non-IID Data
Saul Calderon-Ramirez, Luis Oala

TL;DR
This paper investigates the limitations of semantic data set matching in semi-supervised deep learning under non-IID data conditions, proposing a density-based dissimilarity measure as a more reliable data selection method.
Contribution
It demonstrates the potential degradation caused by semantic matching and introduces a simulation sandbox plus a density-based criterion for improved unlabelled data selection.
Findings
Semantic matching can degrade SSDL performance.
Density-based dissimilarity measures improve data selection.
The non-IID-SSDL sandbox enables stress testing of algorithms.
Abstract
A common heuristic in semi-supervised deep learning (SSDL) is to select unlabelled data based on a notion of semantic similarity to the labelled data. For example, labelled images of numbers should be paired with unlabelled images of numbers instead of, say, unlabelled images of cars. We refer to this practice as semantic data set matching. In this work, we demonstrate the limits of semantic data set matching. We show that it can sometimes even degrade the performance for a state of the art SSDL algorithm. We present and make available a comprehensive simulation sandbox, called non-IID-SSDL, for stress testing an SSDL algorithm under different degrees of distribution mismatch between the labelled and unlabelled data sets. In addition, we demonstrate that simple density based dissimilarity measures in the feature space of a generic classifier offer a promising and more reliable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification
