MixLM: High-Throughput and Effective LLM Ranking via Text-Embedding Mix-Interaction

Guoyao Li; Ran He; Shusen Jing; Kayhan Behdin; Yubo Wang; Sundara Raman Ramachandran; Chanh Nguyen; Jian Sheng; Xiaojing Ma; Chuanrui Zhu; Sriram Vasudevan; Muchen Wu; Sayan Ghosh; Lin Su; Qingquan Song; Xiaoqing Wang; Zhipeng Wang; Qing Lan; Yanning Chen; Jingwei Wu; Luke Simon; Wenjing Zhang; Qi Guo; Fedor Borisyuk

arXiv:2512.07846·cs.IR·February 3, 2026

MixLM: High-Throughput and Effective LLM Ranking via Text-Embedding Mix-Interaction

Guoyao Li, Ran He, Shusen Jing, Kayhan Behdin, Yubo Wang, Sundara Raman Ramachandran, Chanh Nguyen, Jian Sheng, Xiaojing Ma, Chuanrui Zhu, Sriram Vasudevan, Muchen Wu, Sayan Ghosh, Lin Su, Qingquan Song, Xiaoqing Wang, Zhipeng Wang, Qing Lan, Yanning Chen, Jingwei Wu, Luke Simon

PDF

Open Access

TL;DR

MixLM introduces a novel LLM ranking framework that reduces input length using mixed text and embedding tokens, significantly boosting throughput while maintaining relevance in search systems.

Contribution

The paper presents MixLM, a new LLM ranking method that combines text and embedding tokens to improve efficiency without sacrificing accuracy.

Findings

01

Achieved 10x throughput increase over strong baselines.

02

Enabled full-traffic deployment of LLM-powered search at LinkedIn.

03

Resulted in a 0.47% increase in Daily Active Users (DAU).

Abstract

Large language models (LLMs) excel at capturing semantic nuances and therefore show impressive relevance ranking performance in modern recommendation and search systems. However, they suffer from high computational overhead under industrial latency and throughput requirements. In particular, cross-encoder ranking systems often create long context prefill-heavy workloads, as the model has to be presented with the user, query and item information. To this end, we propose MixLM, a novel LLM-based ranking framework, which significantly improves the system throughput via reducing the input context length, while preserving the semantic strength of cross-encoder rankers. In contrast to a standard ranking system where the context is presented to the model as pure text, we propose to use mix-interaction, a mixture of text and embedding tokens to represent the input. Specifically, MixLM encodes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Topic Modeling · Advanced Graph Neural Networks