Few-Shot Adaptation for Multimedia Semantic Indexing
Nakamasa Inoue, Koichi Shinoda

TL;DR
This paper introduces a few-shot adaptation framework for multimedia semantic indexing that combines zero-shot and supervised learning, improving performance with limited training data.
Contribution
It presents a novel method that effectively bridges zero-shot and supervised learning for semantic indexing, outperforming existing methods on multiple datasets.
Findings
Outperforms recent few-shot learning methods on ImageNET.
Achieves state-of-the-art results on TRECVID 2014 dataset.
Demonstrates robustness with limited training examples.
Abstract
We propose a few-shot adaptation framework, which bridges zero-shot learning and supervised many-shot learning, for semantic indexing of image and video data. Few-shot adaptation provides robust parameter estimation with few training examples, by optimizing the parameters of zero-shot learning and supervised many-shot learning simultaneously. In this method, first we build a zero-shot detector, and then update it by using the few examples. Our experiments show the effectiveness of the proposed framework on three datasets: TRECVID Semantic Indexing 2010, 2014, and ImageNET. On the ImageNET dataset, we show that our method outperforms recent few-shot learning methods. On the TRECVID 2014 dataset, we achieve 15.19% and 35.98% in Mean Average Precision under the zero-shot condition and the supervised condition, respectively. To the best of our knowledge, these are the best results on this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
