SimAug: Enhancing Recommendation with Pretrained Language Models for Dense and Balanced Data Augmentation
Yuying Zhao, Xiaodong Yang, Huiyuan Chen, Xiran Fan, Yu Wang, Yiwei, Cai, Tyler Derr

TL;DR
SimAug leverages pretrained language models to augment interaction data with textual similarity, effectively addressing data sparsity and imbalance in recommendation systems, leading to improved utility and fairness.
Contribution
We introduce SimAug, a simple, plug-and-play data augmentation method using PLMs to enhance recommendation data with textual similarity, improving performance and fairness.
Findings
Consistent improvements across nine datasets.
Enhanced recommendation utility with augmented data.
Improved fairness in recommendations.
Abstract
Deep Neural Networks (DNNs) are extensively used in collaborative filtering due to their impressive effectiveness. These systems depend on interaction data to learn user and item embeddings that are crucial for recommendations. However, the data often suffers from sparsity and imbalance issues: limited observations of user-item interactions can result in sub-optimal performance, and a predominance of interactions with popular items may introduce recommendation bias. To address these challenges, we employ Pretrained Language Models (PLMs) to enhance the interaction data with textual information, leading to a denser and more balanced dataset. Specifically, we propose a simple yet effective data augmentation method (SimAug) based on the textual similarity from PLMs, which can be seamlessly integrated to any systems as a lightweight, plug-and-play component in the pre-processing stage. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Machine Learning in Healthcare · Topic Modeling
MethodsSimulation as Augmentation
