Enterprise-Scale Search: Accelerating Inference for Sparse Extreme Multi-Label Ranking Trees
Philip A. Etter, Kai Zhong, Hsiang-Fu Yu, Lexing Ying, Inderjit, Dhillon

TL;DR
This paper introduces MSCM, a specialized sparse matrix technique that significantly accelerates inference in extreme multi-label ranking trees, enabling faster, more efficient industrial-scale semantic search with no loss in accuracy.
Contribution
The paper presents MSCM, a novel sparse matrix multiplication method tailored for XMR trees, providing substantial speedups without sacrificing model accuracy.
Findings
MSCM achieves up to 8x speedup in inference latency.
It works across various datasets and tree models.
No accuracy loss compared to standard methods.
Abstract
Tree-based models underpin many modern semantic search engines and recommender systems due to their sub-linear inference times. In industrial applications, these models operate at extreme scales, where every bit of performance is critical. Memory constraints at extreme scales also require that models be sparse, hence tree-based models are often back-ended by sparse matrix algebra routines. However, there are currently no sparse matrix techniques specifically designed for the sparsity structure one encounters in tree-based models for extreme multi-label ranking/classification (XMR/XMC) problems. To address this issue, we present the masked sparse chunk multiplication (MSCM) technique, a sparse matrix technique specifically tailored to XMR trees. MSCM is easy to implement, embarrassingly parallelizable, and offers a significant performance boost to any existing tree inference pipeline at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Graph Neural Networks · Topic Modeling
