CascadeXML: Rethinking Transformers for End-to-end Multi-resolution Training in Extreme Multi-label Classification
Siddhant Kharbanda, Atmadeep Banerjee, Erik Schultheis, Rohit, Babbar

TL;DR
CascadeXML introduces a multi-resolution transformer-based approach for extreme multi-label classification, effectively utilizing separate feature representations at different label resolutions to outperform existing methods on large-scale benchmarks.
Contribution
It proposes CascadeXML, an end-to-end multi-resolution learning pipeline that leverages transformer layers to maintain separate features for each label resolution, improving performance.
Findings
Outperforms existing XMC methods on large benchmark datasets
Achieves significant accuracy gains with multi-resolution approach
Demonstrates the effectiveness of separate feature representations for label resolutions
Abstract
Extreme Multi-label Text Classification (XMC) involves learning a classifier that can assign an input with a subset of most relevant labels from millions of label choices. Recent approaches, such as XR-Transformer and LightXML, leverage a transformer instance to achieve state-of-the-art performance. However, in this process, these approaches need to make various trade-offs between performance and computational requirements. A major shortcoming, as compared to the Bi-LSTM based AttentionXML, is that they fail to keep separate feature representations for each resolution in a label tree. We thus propose CascadeXML, an end-to-end multi-resolution learning pipeline, which can harness the multi-layered architecture of a transformer model for attending to different label resolutions with separate feature representations. CascadeXML significantly outperforms all existing approaches with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsText and Document Classification Technologies · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
Methodsfail
