DXML: Distributed Extreme Multilabel Classification
Pawan Kumar

TL;DR
This paper introduces DXML, a scalable distributed and shared memory system for extreme multilabel classification, combining MPI and OpenMP to improve training and testing efficiency on large datasets.
Contribution
It presents a hybrid distributed-shared memory implementation for extreme multilabel classification, including communication latency analysis and scalability insights.
Findings
Faster training and testing on large datasets.
Relatively small model sizes in some cases.
Provides scalability analysis for similar methods.
Abstract
As a big data application, extreme multilabel classification has emerged as an important research topic with applications in ranking and recommendation of products and items. A scalable hybrid distributed and shared memory implementation of extreme classification for large scale ranking and recommendation is proposed. In particular, the implementation is a mix of message passing using MPI across nodes and using multithreading on the nodes using OpenMP. The expression for communication latency and communication volume is derived. Parallelism using work-span model is derived for shared memory architecture. This throws light on the expected scalability of similar extreme classification methods. Experiments show that the implementation is relatively faster to train and test on some large datasets. In some cases, model size is relatively small.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Neural Networks and Applications
