CADC: Encoding User-Item Interactions for Compressing Recommendation   Model Training Data

Hossein Entezari Zarch; Abdulla Alshabanah; Chaoyi Jiang; Murali; Annavaram

arXiv:2407.08108·cs.IR·July 25, 2024

CADC: Encoding User-Item Interactions for Compressing Recommendation Model Training Data

Hossein Entezari Zarch, Abdulla Alshabanah, Chaoyi Jiang, Murali, Annavaram

PDF

Open Access

TL;DR

This paper introduces CADC, a method that compresses training data for recommendation models by encoding user-item interactions into embeddings, enabling significant dataset reduction without sacrificing model accuracy.

Contribution

CADC proposes a novel two-step approach combining matrix factorization and sampling to effectively compress recommendation training data while preserving accuracy.

Findings

01

Enriched embeddings capture interaction history effectively.

02

Sampling reduces dataset size drastically with minimal accuracy loss.

03

Method maintains recommendation quality with significantly less data.

Abstract

Deep learning recommendation models (DLRMs) are at the heart of the current e-commerce industry. However, the amount of training data used to train these large models is growing exponentially, leading to substantial training hurdles. The training dataset contains two primary types of information: content-based information (features of users and items) and collaborative information (interactions between users and items). One approach to reduce the training dataset is to remove user-item interactions. But that significantly diminishes collaborative information, which is crucial for maintaining accuracy due to its inclusion of interaction histories. This loss profoundly impacts DLRM performance. This paper makes an important observation that if one can capture the user-item interaction history to enrich the user and item embeddings, then the interaction history can be compressed without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Recommender Systems and Techniques

MethodsAttentive Walk-Aggregating Graph Neural Network