OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender
Zhaoqi Zhang, Haolei Pei, Jun Guo, Tianyu Wang, Yufei Feng, Hui Sun, Shaowei Liu, Aixin Sun

TL;DR
OneTrans is a unified Transformer model that integrates user-behavior sequence modeling and feature interaction, enabling efficient, scalable, and improved recommendation performance through shared parameters and caching strategies.
Contribution
It introduces a unified Transformer backbone with a shared tokenizer and parameter sharing for both sequence and feature interaction modeling, unifying and scaling recommendation tasks.
Findings
Outperforms strong baselines on industrial datasets
Achieves a 5.68% lift in per-user GMV in online tests
Scales efficiently with increasing model size
Abstract
In recommendation systems, scaling up feature-interaction modules (e.g., Wukong, RankMixer) or user-behavior sequence modules (e.g., LONGER) has achieved notable success. However, these efforts typically proceed on separate tracks, which not only hinders bidirectional information exchange but also prevents unified optimization and scaling. In this paper, we propose OneTrans, a unified Transformer backbone that simultaneously performs user-behavior sequence modeling and feature interaction. OneTrans employs a unified tokenizer to convert both sequential and non-sequential attributes into a single token sequence. The stacked OneTrans blocks share parameters across similar sequential tokens while assigning token-specific parameters to non-sequential tokens. Through causal attention and cross-request KV caching, OneTrans enables precomputation and caching of intermediate representations,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
