TriMLP: Revenge of a MLP-like Architecture in Sequential Recommendation
Yiheng Jiang, Yuanbo Xu, Yongjian Yang, Funing Yang, Pengyang Wang and, Hui Xiong

TL;DR
TriMLP introduces a novel MLP-like architecture with a triangular mixer for sequential recommendation, effectively capturing dependencies and improving accuracy while reducing inference costs across multiple datasets.
Contribution
The paper proposes TriMLP, a new MLP-like model with a triangular mixer that addresses information leakage and enhances sequential recommendation performance.
Findings
Achieves up to 14.88% accuracy improvement over baselines
Reduces inference cost by 8.65% on average
Performs well across diverse datasets and benchmarks
Abstract
In this paper, we present a MLP-like architecture for sequential recommendation, namely TriMLP, with a novel Triangular Mixer for cross-token communications. In designing Triangular Mixer, we simplify the cross-token operation in MLP as the basic matrix multiplication, and drop the lower-triangle neurons of the weight matrix to block the anti-chronological order connections from future tokens. Accordingly, the information leakage issue can be remedied and the prediction capability of MLP can be fully excavated under the standard auto-regressive mode. Take a step further, the mixer serially alternates two delicate MLPs with triangular shape, tagged as global and local mixing, to separately capture the long range dependencies and local patterns on fine-grained level, i.e., long and short-term preferences. Empirical study on 12 datasets of different scales (50K\textasciitilde 10M user-item…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Graph Neural Networks · Human Mobility and Location-Based Analysis
MethodsMulti-Head Attention · Absolute Position Encodings · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Attention Is All You Need · Linear Layer · Label Smoothing · Adam
