Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential Recommendation
Vivian Lai, Huiyuan Chen, Chin-Chia Michael Yeh, Minghua Xu, Yiwei, Cai, Hao Yang

TL;DR
This paper introduces SAMRec, a loss landscape-based approach that improves Transformer models for sequential recommendation by enhancing data efficiency and robustness without relying on self-supervised pre-training or data augmentation.
Contribution
The study proposes SAMRec, a novel regularization method inspired by loss geometry, to improve Transformer training in sparse data scenarios without self-supervised learning.
Findings
SAMRec outperforms standard Transformers in accuracy and robustness.
SAMRec achieves comparable results to state-of-the-art self-supervised models.
Transformers tend to converge to sharp minima without regularization.
Abstract
Transformer and its variants are a powerful class of architectures for sequential recommendation, owing to their ability of capturing a user's dynamic interests from their past interactions. Despite their success, Transformer-based models often require the optimization of a large number of parameters, making them difficult to train from sparse data in sequential recommendation. To address the problem of data sparsity, previous studies have utilized self-supervised learning to enhance Transformers, such as pre-training embeddings from item attributes or contrastive data augmentations. However, these approaches encounter several training issues, including initialization sensitivity, manual data augmentations, and large batch-size memory bottlenecks. In this work, we investigate Transformers from the perspective of loss geometry, aiming to enhance the models' data efficiency and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSharpness-Aware Minimization
