Text2Model: Text-based Model Induction for Zero-shot Image Classification
Ohad Amosy, Tomer Volk, Eilam Shapira, Eyal Ben-David, Roi Reichart, and Gal Chechik

TL;DR
This paper introduces Text2Model, a hypernetwork-based method that generates task-specific classifiers from text descriptions for zero-shot image, point cloud, and action recognition, improving generalization and efficiency.
Contribution
It proposes a novel hypernetwork approach that creates non-linear, task-specific classifiers from textual descriptions, enabling versatile zero-shot classification across multiple modalities.
Findings
Outperforms previous zero-shot classification methods.
Handles rich textual descriptions effectively.
Produces lightweight models suitable for on-device use.
Abstract
We address the challenge of building task-agnostic classifiers using only text descriptions, demonstrating a unified approach to image classification, 3D point cloud classification, and action recognition from scenes. Unlike approaches that learn a fixed representation of the output classes, we generate at inference time a model tailored to a query classification task. To generate task-based zero-shot classifiers, we train a hypernetwork that receives class descriptions and outputs a multi-class model. The hypernetwork is designed to be equivariant with respect to the set of descriptions and the classification layer, thus obeying the symmetries of the problem and improving generalization. Our approach generates non-linear classifiers, handles rich textual descriptions, and may be adapted to produce lightweight models efficient enough for on-device applications. We evaluate this approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
