Label Denoising with Large Ensembles of Heterogeneous Neural Networks
Pavel Ostyakov, Elizaveta Logacheva, Roman Suvorov, Vladimir Aliev,, Gleb Sterkin, Oleg Khomenko, and Sergey I. Nikolenko

TL;DR
This paper presents a large ensemble approach for video classification on YouTube-8M, utilizing knowledge distillation and mixup to handle noisy labels and improve model performance within hardware constraints.
Contribution
It introduces a novel ensemble-based solution for large-scale video classification that effectively manages noisy labels using knowledge distillation and data augmentation techniques.
Findings
Large ensemble improves classification accuracy
Knowledge distillation effectively handles noisy labels
Mixup enhances model robustness
Abstract
Despite recent advances in computer vision based on various convolutional architectures, video understanding remains an important challenge. In this work, we present and discuss a top solution for the large-scale video classification (labeling) problem introduced as a Kaggle competition based on the YouTube-8M dataset. We show and compare different approaches to preprocessing, data augmentation, model architectures, and model combination. Our final model is based on a large ensemble of video- and frame-level models but fits into rather limiting hardware constraints. We apply an approach based on knowledge distillation to deal with noisy labels in the original dataset and the recently developed mixup technique to improve the basic models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Image Enhancement Techniques · Advanced Vision and Imaging
MethodsKnowledge Distillation · Mixup
