MixMOOD: A systematic approach to class distribution mismatch in semi-supervised learning using deep dataset dissimilarity measures
Saul Calderon-Ramirez, Luis Oala, Jordina Torrents-Barrena, Shengxiang, Yang, Armaghan Moemeni, Wojciech Samek, Miguel A. Molina-Cabello

TL;DR
MixMOOD introduces a systematic method using deep dataset dissimilarity measures to select unlabelled data, improving semi-supervised learning robustness against class distribution mismatch and out-of-distribution data.
Contribution
The paper presents DeDiMs, a novel, quick, and model-agnostic dataset dissimilarity measure, and demonstrates its effectiveness in ranking unlabelled datasets for semi-supervised learning.
Findings
DeDiMs strongly correlate with MixMatch accuracy across various OOD scenarios.
Semantic similarity is not a reliable heuristic for unlabelled data selection.
MixMOOD can standardize evaluation of semi-supervised methods under real-world OOD conditions.
Abstract
In this work, we propose MixMOOD - a systematic approach to mitigate effect of class distribution mismatch in semi-supervised deep learning (SSDL) with MixMatch. This work is divided into two components: (i) an extensive out of distribution (OOD) ablation test bed for SSDL and (ii) a quantitative unlabelled dataset selection heuristic referred to as MixMOOD. In the first part, we analyze the sensitivity of MixMatch accuracy under 90 different distribution mismatch scenarios across three multi-class classification tasks. These are designed to systematically understand how OOD unlabelled data affects MixMatch performance. In the second part, we propose an efficient and effective method, called deep dataset dissimilarity measures (DeDiMs), to compare labelled and unlabelled datasets. The proposed DeDiMs are quick to evaluate and model agnostic. They use the feature space of a generic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · COVID-19 diagnosis using AI
