The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection
Pascal Mettes, Dennis C. Koelma, Cees G. M. Snoek

TL;DR
This paper introduces a novel method for reorganizing ImageNet hierarchy to improve deep neural network pre-training for video event detection, leading to state-of-the-art results on TRECVID datasets.
Contribution
It proposes a bottom-up and top-down hierarchy reorganization of ImageNet for better pre-training, enhancing video event detection performance.
Findings
Improved performance over standard pre-training.
Complementary benefits among different reorganizations.
Achieved state-of-the-art results on TRECVID datasets.
Abstract
This paper strives for video event detection using a representation learned from deep convolutional neural networks. Different from the leading approaches, who all learn from the 1,000 classes defined in the ImageNet Large Scale Visual Recognition Challenge, we investigate how to leverage the complete ImageNet hierarchy for pre-training deep networks. To deal with the problems of over-specific classes and classes with few images, we introduce a bottom-up and top-down approach for reorganization of the ImageNet hierarchy based on all its 21,814 classes and more than 14 million images. Experiments on the TRECVID Multimedia Event Detection 2013 and 2015 datasets show that video representations derived from the layers of a deep neural network pre-trained with our reorganized hierarchy i) improves over standard pre-training, ii) is complementary among different reorganizations, iii)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
