TL;DR
This paper introduces a method combining distillation, pruning, and optimized matrix multiplication to create efficient neural networks for ranking tasks, achieving up to 4x faster scoring while maintaining effectiveness.
Contribution
It presents a novel approach that significantly speeds up neural network scoring in learning-to-rank by integrating distillation, pruning, and high-performance matrix multiplication techniques.
Findings
Neural networks can be distilled and pruned for efficient ranking.
The proposed method achieves up to 4x faster scoring.
Effectiveness is maintained despite increased efficiency.
Abstract
Recent studies in Learning to Rank have shown the possibility to effectively distill a neural network from an ensemble of regression trees. This result leads neural networks to become a natural competitor of tree-based ensembles on the ranking task. Nevertheless, ensembles of regression trees outperform neural models both in terms of efficiency and effectiveness, particularly when scoring on CPU. In this paper, we propose an approach for speeding up neural scoring time by applying a combination of Distillation, Pruning and Fast Matrix multiplication. We employ knowledge distillation to learn shallow neural networks from an ensemble of regression trees. Then, we exploit an efficiency-oriented pruning technique that performs a sparsification of the most computationally-intensive layers of the neural network that is then scored with optimized sparse matrix multiplication. Moreover, by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning · Knowledge Distillation
