Compute Only Once: UG-Separation for Efficient Large Recommendation Models

Hui Lu; Zheng Chai; Shipeng Bai; Hao Zhang; Zhifang Fan; Kunmin Bai; Ke Sun; Yingwen Wu; Bingzheng Wei; Xiang Sun; Ziyan Gong; Tianyi Liu; Hua Chen; Deping Xie; Zhongkai Chen; Zhiliang Guo; Qiwei Chen; Yuchao Zheng

arXiv:2602.10455·cs.IR·May 21, 2026

Compute Only Once: UG-Separation for Efficient Large Recommendation Models

Hui Lu, Zheng Chai, Shipeng Bai, Hao Zhang, Zhifang Fan, Kunmin Bai, Ke Sun, Yingwen Wu, Bingzheng Wei, Xiang Sun, Ziyan Gong, Tianyi Liu, Hua Chen, Deping Xie, Zhongkai Chen, Zhiliang Guo, Qiwei Chen, Yuchao Zheng

PDF

TL;DR

This paper introduces UG-Sep, a framework that disentangles user and item information in dense models, enabling computation reuse and reducing inference costs in large-scale recommendation systems.

Contribution

UG-Sep is the first to enable user-side computation reuse in TokenMixer-based dense interaction models, significantly improving efficiency in large recommendation models.

Findings

01

Reduces inference latency by up to 20% in large-scale recommender systems.

02

Maintains online user experience and commercial metrics after applying UG-Sep.

03

Effectively balances model expressiveness with computational efficiency.

Abstract

Driven by scaling laws, recommender systems increasingly rely on larger-scale models to capture complex feature interactions and user behaviors, but this trend also leads to prohibitive training and inference costs. While long-sequence models can reuse user-side computation through KV Caching, such reuse is difficult in TokenMixer-based dense feature interaction architectures, where user and group features are deeply entangled and mixed-up across layers. In this work, we present User-Group Separation (UG-Sep), an industrial large-scale framework that enables user-side computation reusable in TokenMixer-based dense interaction models for the first time. UG-Sep explicitly disentangles user-side and item-side information flows within token-mixing layers, ensuring that a subset of tokens preserves purely user-side representations across layers. This design allows the corresponding per-token…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Mobile Crowdsensing and Crowdsourcing · Big Data and Digital Economy