MuMIC -- Multimodal Embedding for Multi-label Image Classification with   Tempered Sigmoid

Fengjun Wang; Sarai Mizrachi; Moran Beladev; Guy Nadav; Gil Amsalem,; Karen Lastmann Assaraf; Hadas Harush Boker

arXiv:2211.05232·cs.CV·November 11, 2022·1 cites

MuMIC -- Multimodal Embedding for Multi-label Image Classification with Tempered Sigmoid

Fengjun Wang, Sarai Mizrachi, Moran Beladev, Guy Nadav, Gil Amsalem,, Karen Lastmann Assaraf, Hadas Harush Boker

PDF

Open Access 1 Video

TL;DR

MuMIC leverages contrastively pretrained multimodal models with a tempered sigmoid loss for high-performance multi-label image classification, effectively handling noisy data and enabling zero-shot predictions.

Contribution

This paper introduces MuMIC, the first adaptation of contrastively learnt multimodal pretraining for real-world multi-label image classification tasks.

Findings

01

Achieved 85.6% GAP@10 on Booking.com images

02

Outperformed state-of-the-art models in multi-label classification

03

Supported zero-shot and domain-specific predictions

Abstract

Multi-label image classification is a foundational topic in various domains. Multimodal learning approaches have recently achieved outstanding results in image representation and single-label image classification. For instance, Contrastive Language-Image Pretraining (CLIP) demonstrates impressive image-text representation learning abilities and is robust to natural distribution shifts. This success inspires us to leverage multimodal learning for multi-label classification tasks, and benefit from contrastively learnt pretrained models. We propose the Multimodal Multi-label Image Classification (MuMIC) framework, which utilizes a hardness-aware tempered sigmoid based Binary Cross Entropy loss function, thus enables the optimization on multi-label objectives and transfer learning on CLIP. MuMIC is capable of providing high classification performance, handling real-world noisy data,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MuMIC - Multimodal Embedding for Multi-label Image Classification with Tempered Sigmoid· underline

Taxonomy

TopicsText and Document Classification Technologies · Machine Learning in Bioinformatics · Image Retrieval and Classification Techniques

MethodsContrastive Language-Image Pre-training