Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity
Charlie Hou, Kiran Koshy Thekumparampil, Michael Shavlovsky, Giulia Fanti, Yesh Dattatreya, Sujay Sanghavi

TL;DR
This paper demonstrates that pretrained deep learning models can outperform Gradient Boosted Decision Trees in tabular Learning-to-Rank tasks when labeled data is scarce, especially by leveraging unlabeled data through pretraining.
Contribution
The study introduces the use of unsupervised pretraining for deep models in tabular Learning-to-Rank, showing significant improvements over GBDTs in label-scarce scenarios.
Findings
Pretrained DL rankers outperform GBDTs by up to 38%.
DL models excel on outliers in ranking tasks.
Unsupervised pretraining effectively exploits unlabeled data.
Abstract
On tabular data, a significant body of literature has shown that current deep learning (DL) models perform at best similarly to Gradient Boosted Decision Trees (GBDTs), while significantly underperforming them on outlier data. However, these works often study idealized problem settings which may fail to capture complexities of real-world scenarios. We identify a natural tabular data setting where DL models can outperform GBDTs: tabular Learning-to-Rank (LTR) under label scarcity. Tabular LTR applications, including search and recommendation, often have an abundance of unlabeled data, and scarce labeled data. We show that DL rankers can utilize unsupervised pretraining to exploit this unlabeled data. In extensive experiments over both public and proprietary datasets, we show that pretrained DL rankers consistently outperform GBDT rankers on ranking metrics -- sometimes by as much as 38%…
Peer Reviews
Decision·Submitted to ICLR 2024
- Representation learning in tabular LTR has not been studied much. - SimCLR-Rank provides a strategy specific to the structure of LTR task.
- Technical novelty is limited. - Datasets in experiments are limited to three datasets and one private dataset. - Representation learning on tabular data not necessarily needs to consider LTR setting, since multi-layer MLP will be finetuned for LTR task. As a paper that proposes a new tabular self-supervised learning method, it lacks the comparison with other existing methods, such as - Hajiramezanali, E., Diamant, N. L., Scalia, G., & Shen, M. W. (2022, October). STab: Self-supervised Lear
1. This article is characterized by clear and comprehensible writing, presenting methods that are straightforward and easily implementable. 2. Through experimentation, this paper demonstrates that pre-trained deep models can achieve performance levels close to, or even surpass, GBDT in ranking tasks. This discovery holds practical value. 3. The paper introduces a pre-training approach that leverages the nature of learning to rank problems, demonstrating reasonable effectiveness, and in certain
1. This paper exhibits notable deficiencies in the aspects of experimental comparisons and discussions on related work. The experimental comparison methodology only considers comparisons between fine-tuning or probing methods based on pre-trained models, as well as MLP models. I believe that in the realm of learning to rank and tabular data, there are likely more recent deep learning methods that could serve as baselines for comparison. Proper discussions about these methods should also be incor
S1: To the reviewer’s knowledge, this is the first work that shows some promises for pre-trained DNNs for the LTR task. The reviewer thought about the direction but It was not intuitively clear how to do it or if it has benefits. The paper still has several caveats but is a decent exploration in some aspects. S2: the motivation of the ranking contrastive loss is clear and easy to understand - it is clear what the hard negatives are for LTR problems, so it is good to leverage that. S3: It is go
W1: change over SimCLR is incremental - the major weakness of SimCLR for non-ranking problems was complexity. The authors made a good point that hard negatives are clear for LTR (mentioned in S1), but SimCLR using small batches will likely largely resolve the issues? So the necessity for a new loss is not very convincing - in fact, the authors did not comprehensively compare with that baseline. - also, sometimes having easy negatives may improve the generalization of learning - this may need dee
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Imbalanced Data Classification Techniques
MethodsBalanced Selection · Residual Block · Residual Connection · 1x1 Convolution · Batch Normalization · Color Jitter · Kaiming Initialization · Dense Connections · Random Resized Crop · *Communicated@Fast*How Do I Communicate to Expedia?
