Text2Model: Text-based Model Induction for Zero-shot Image   Classification

Ohad Amosy; Tomer Volk; Eilam Shapira; Eyal Ben-David; Roi Reichart; and Gal Chechik

arXiv:2210.15182·cs.CV·October 1, 2024

Text2Model: Text-based Model Induction for Zero-shot Image Classification

Ohad Amosy, Tomer Volk, Eilam Shapira, Eyal Ben-David, Roi Reichart, and Gal Chechik

PDF

Open Access 1 Video

TL;DR

This paper introduces Text2Model, a hypernetwork-based method that generates task-specific classifiers from text descriptions for zero-shot image, point cloud, and action recognition, improving generalization and efficiency.

Contribution

It proposes a novel hypernetwork approach that creates non-linear, task-specific classifiers from textual descriptions, enabling versatile zero-shot classification across multiple modalities.

Findings

01

Outperforms previous zero-shot classification methods.

02

Handles rich textual descriptions effectively.

03

Produces lightweight models suitable for on-device use.

Abstract

We address the challenge of building task-agnostic classifiers using only text descriptions, demonstrating a unified approach to image classification, 3D point cloud classification, and action recognition from scenes. Unlike approaches that learn a fixed representation of the output classes, we generate at inference time a model tailored to a query classification task. To generate task-based zero-shot classifiers, we train a hypernetwork that receives class descriptions and outputs a multi-class model. The hypernetwork is designed to be equivariant with respect to the set of descriptions and the classification layer, thus obeying the symmetries of the problem and improving generalization. Our approach generates non-linear classifiers, handles rich textual descriptions, and may be adapted to produce lightweight models efficient enough for on-device applications. We evaluate this approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Text2Model: Text-based Model Induction for Zero-shot Image Classification· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling