Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images
Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille

TL;DR
This paper introduces a method for rapidly learning new visual concepts from limited image and sentence data, leveraging linguistic context and visual features to improve image captioning and concept integration.
Contribution
It presents a novel approach combining an improved m-RNN captioning model with techniques for learning new concepts from few examples, including a transposed weight sharing scheme and overfitting prevention.
Findings
Effective learning of novel concepts from few examples.
Improved image captioning performance with the proposed model.
Successful creation of new concept datasets for this task.
Abstract
In this paper, we address the task of learning novel visual concepts, and their interactions with other concepts, from a few images with sentence descriptions. Using linguistic context and visual features, our method is able to efficiently hypothesize the semantic meaning of new words and add them to its word dictionary so that they can be used to describe images which contain these novel concepts. Our method has an image captioning module based on m-RNN with several improvements. In particular, we propose a transposed weight sharing scheme, which not only improves performance on image captioning, but also makes the model more suitable for the novel concept learning task. We propose methods to prevent overfitting the new concepts. In addition, three novel concept datasets are constructed for this new task. In the experiments, we show that our method effectively learns novel visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
