Masked Graph Transformer for Large-Scale Recommendation

Huiyuan Chen; Zhe Xu; Chin-Chia Michael Yeh; Vivian Lai; Yan Zheng,; Minghua Xu; Hanghang Tong

arXiv:2405.04028·cs.IR·May 8, 2024

Masked Graph Transformer for Large-Scale Recommendation

Huiyuan Chen, Zhe Xu, Chin-Chia Michael Yeh, Vivian Lai, Yan Zheng,, Minghua Xu, Hanghang Tong

PDF

TL;DR

This paper introduces MGFormer, a scalable Masked Graph Transformer for large-scale recommendation systems that captures all-pair node interactions efficiently with linear complexity, outperforming existing methods.

Contribution

The paper presents MGFormer, a novel linear-complexity graph transformer that effectively models all-pair node interactions for large-scale recommendation tasks.

Findings

01

MGFormer achieves superior recommendation performance.

02

It operates with linear complexity, enabling scalability.

03

Single-layer MGFormer outperforms deeper models.

Abstract

Graph Transformers have garnered significant attention for learning graph-structured data, thanks to their superb ability to capture long-range dependencies among nodes. However, the quadratic space and time complexity hinders the scalability of Graph Transformers, particularly for large-scale recommendation. Here we propose an efficient Masked Graph Transformer, named MGFormer, capable of capturing all-pair interactions among nodes with a linear complexity. To achieve this, we treat all user/item nodes as independent tokens, enhance them with positional embeddings, and feed them into a kernelized attention module. Additionally, we incorporate learnable relative degree information to appropriately reweigh the attentions. Experimental results show the superior performance of our MGFormer, even with a single attention layer.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Laplacian EigenMap · Laplacian Positional Encodings · Dropout · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Linear Layer