Multi-modal Extreme Classification

Anshul Mittal; Kunal Dahiya; Shreya Malani; Janani Ramaswamy; Seba; Kuruvilla; Jitendra Ajmera; Keng-hao Chang; Sumeet Agarwal; Purushottam Kar,; Manik Varma

arXiv:2309.04961·cs.IR·September 12, 2023

Multi-modal Extreme Classification

Anshul Mittal, Kunal Dahiya, Shreya Malani, Janani Ramaswamy, Seba, Kuruvilla, Jitendra Ajmera, Keng-hao Chang, Sumeet Agarwal, Purushottam Kar,, Manik Varma

PDF

1 Repo

TL;DR

This paper introduces MUFIN, a multi-modal extreme classification method that effectively handles millions of labels by combining visual and textual data, achieving higher accuracy in large-scale product recommendation and prediction tasks.

Contribution

MUFIN is the first multi-modal XC approach that uses cross-modal attention and scalable training routines, bridging the gap between embedding-based and classifier-based methods.

Findings

01

MUFIN outperforms existing methods by at least 3% accuracy on multiple datasets.

02

Developed a new dataset MM-AmazonTitles-300K with over 300K products and multi-modal descriptors.

03

Achieved scalable training and inference routines with logarithmic complexity in the number of labels.

Abstract

This paper develops the MUFIN technique for extreme classification (XC) tasks with millions of labels where datapoints and labels are endowed with visual and textual descriptors. Applications of MUFIN to product-to-product recommendation and bid query prediction over several millions of products are presented. Contemporary multi-modal methods frequently rely on purely embedding-based methods. On the other hand, XC methods utilize classifier architectures to offer superior accuracies than embedding only methods but mostly focus on text-based categorization tasks. MUFIN bridges this gap by reformulating multi-modal categorization as an XC problem with several millions of labels. This presents the twin challenges of developing multi-modal architectures that can offer embeddings sufficiently expressive to allow accurate categorization over millions of labels; and training and inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

extreme-classification/mufin
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus