Improving Few-Shot Image Classification Using Machine- and User-Generated Natural Language Descriptions
Kosuke Nishida, Kyosuke Nishida, Shuichi Nishioka

TL;DR
This paper introduces LIDE, a model that leverages machine- and user-generated natural language descriptions to enhance few-shot image classification, demonstrating improved performance and interpretability.
Contribution
The paper proposes LIDE, a novel model that integrates text generation and encoding to utilize natural language descriptions for better few-shot learning.
Findings
LIDE outperforms baseline models with machine-generated descriptions.
High-quality user descriptions further improve classification accuracy.
Generated descriptions serve as explanations aligned with predictions.
Abstract
Humans can obtain the knowledge of novel visual concepts from language descriptions, and we thus use the few-shot image classification task to investigate whether a machine learning model can have this capability. Our proposed model, LIDE (Learning from Image and DEscription), has a text decoder to generate the descriptions and a text encoder to obtain the text representations of machine- or user-generated descriptions. We confirmed that LIDE with machine-generated descriptions outperformed baseline models. Moreover, the performance was improved further with high-quality user-generated descriptions. The generated descriptions can be viewed as the explanations of the model's predictions, and we observed that such explanations were consistent with prediction results. We also investigated why the language description improved the few-shot image classification performance by comparing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
