GRAM: Fast Fine-tuning of Pre-trained Language Models for Content-based Collaborative Filtering
Yoonseok Yang, Kyu Seok Kim, Minsam Kim, Juneyoung Park

TL;DR
GRAM introduces a gradient accumulation technique that accelerates training of pre-trained language model-based content-based collaborative filtering, reducing resource consumption while maintaining performance.
Contribution
It proposes Single-step and Multi-step GRAM methods that significantly speed up training with less memory, enabling efficient fine-tuning of PLM-based CCF models.
Findings
Up to 146x training speedup on multiple datasets.
Maintains theoretical equivalence to end-to-end training.
Reduces GPU memory usage substantially.
Abstract
Content-based collaborative filtering (CCF) predicts user-item interactions based on both users' interaction history and items' content information. Recently, pre-trained language models (PLM) have been used to extract high-quality item encodings for CCF. However, it is resource-intensive to train a PLM-based CCF model in an end-to-end (E2E) manner, since optimization involves back-propagating through every content encoding within a given user interaction sequence. To tackle this issue, we propose GRAM (GRadient Accumulation for Multi-modality in CCF), which exploits the fact that a given item often appears multiple times within a batch of interaction histories. Specifically, Single-step GRAM aggregates each item encoding's gradients for back-propagation, with theoretic equivalence to the standard E2E training. As an extension of Single-step GRAM, we propose Multi-step GRAM, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · FinTech, Crowdfunding, Digital Finance
