Best Practices for Distilling Large Language Models into BERT for Web   Search Ranking

Dezhi Ye; Junwei Hu; Jiabin Fan; Bowen Tian; Jie Liu; Haijin Liang,; Jin Ma

arXiv:2411.04539·cs.IR·November 8, 2024

Best Practices for Distilling Large Language Models into BERT for Web Search Ranking

Dezhi Ye, Junwei Hu, Jiabin Fan, Bowen Tian, Jie Liu, Haijin Liang,, Jin Ma

PDF

Open Access

TL;DR

This paper presents a method to transfer the ranking capabilities of large language models to smaller, more efficient models like BERT, enabling high-quality web search ranking under resource constraints.

Contribution

The authors introduce a novel training technique combining continued pre-training and hybrid loss functions to distill LLM ranking knowledge into BERT-like models.

Findings

01

The distilled models outperform baseline BERT models in ranking accuracy.

02

The approach reduces computational costs significantly while maintaining high relevance ranking quality.

03

Successful deployment in a commercial search engine demonstrates practical effectiveness.

Abstract

Recent studies have highlighted the significant potential of Large Language Models (LLMs) as zero-shot relevance rankers. These methods predominantly utilize prompt learning to assess the relevance between queries and documents by generating a ranked list of potential documents. Despite their promise, the substantial costs associated with LLMs pose a significant challenge for their direct implementation in commercial search systems. To overcome this barrier and fully exploit the capabilities of LLMs for text ranking, we explore techniques to transfer the ranking expertise of LLMs to a more compact model similar to BERT, using a ranking loss to enable the deployment of less resource-intensive models. Specifically, we enhance the training of LLMs through Continued Pre-Training, taking the query as input and the clicked title and summary as output. We then proceed with supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Web Data Mining and Analysis · Topic Modeling

MethodsAttention Is All You Need · Linear Layer · Softmax · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Layer Normalization · Linear Warmup With Linear Decay · WordPiece · Adam