On the Embedding Collapse when Scaling up Recommendation Models

Xingzhuo Guo; Junwei Pan; Ximei Wang; Baixu Chen; Jie Jiang; Mingsheng; Long

arXiv:2310.04400·cs.LG·June 7, 2024·1 cites

On the Embedding Collapse when Scaling up Recommendation Models

Xingzhuo Guo, Junwei Pan, Ximei Wang, Baixu Chen, Jie Jiang, Mingsheng, Long

PDF

Open Access 2 Repos 3 Reviews

TL;DR

This paper investigates the embedding collapse phenomenon in large recommendation models, analyzing its causes and effects, and proposes a multi-embedding design with interaction modules to improve scalability and reduce collapse.

Contribution

It identifies embedding collapse as a key scalability issue and introduces a multi-embedding approach with interaction modules to mitigate it in recommendation models.

Findings

01

Embedding collapse restricts embedding learning and scalability.

02

Interaction modules help mitigate embedding collapse.

03

Proposed design improves scalability and reduces collapse across models.

Abstract

Recent advances in foundation models have led to a promising trend of developing large recommendation models to leverage vast amounts of available data. Still, mainstream models remain embarrassingly small in size and na\"ive enlarging does not lead to sufficient performance gain, suggesting a deficiency in the model scalability. In this paper, we identify the embedding collapse phenomenon as the inhibition of scalability, wherein the embedding matrix tends to occupy a low-dimensional subspace. Through empirical and theoretical analysis, we demonstrate a \emph{two-sided effect} of feature interaction specific to recommendation models. On the one hand, interacting with collapsed embeddings restricts embedding learning and exacerbates the collapse issue. On the other hand, interaction is crucial in mitigating the fitting of spurious features as a scalability guarantee. Based on our…

Peer Reviews

Decision·ICML 2024 Poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

Originality: - The paper investigates the enlarged embedding layers of recommendation models and identifies a phenomenon of embedding collapse, wherein the embedding matrix tends to reside in a low-dimensional subspace. The discovery is novel as far as I know. - The paper proposed information abundance to measure the degree of collapse for embedding matrices. Quality: - The paper is well-written. It starts with a novel finding of embedding collapse when increasing embedding dimension which mig

Weaknesses

- In section 3, the paper proposes Information Abundance to measure the degree of collapse of embedding matrices. As the paper focuses on the scaling law of embedding layers, the paper should discuss whether Information Abundance is a fair metric when comparing embedding matrices of different dimension sizes. - In section 4.2, the paper uses regularized DCNv2 as an example to show that suppressing feature interaction is insufficient for scalability. It is unclear to me why feature interaction in

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

S1. This paper provides empirical and theoretical analysis of the embedding collapse phenomenon. S2. This paper provides information abundance for quantifying the degree of collapse for such matrices with low-rank tendencies.

Weaknesses

W1. The novelty of this paper seems to be limited. The method of dividing the single embedding into multi-embedding sets is similar to DMRL[1] for disentangled representation learning. DMRL divides the feature representation of each modality into k chunks. As a result, the features of different factors are entangled. W2. The motivation is not completely solid. The reason for increasing the embedding size of the model is inappropriate. W3. The experimental results of the paper are insufficient.

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

Originality: - ‘Information Abundance' as a quantitative novel measure to measure the embedding layer collapse. - The 'Interaction-Collapse Law' and the two sided effect of feature interaction process helps improve the understanding of embeddings' behavior in recommendation systems. Quality: - The authors have detailed exploration of embeddings and their behavior, particularly in the context of information collapse with rigorous visualizations. Significance: - Broad Implications for Recommend

Weaknesses

1. Insufficient Empirical Validation on Large-Scale Data: The authors have shown with empirical evidences that large scale recommendation models scale poorly. However it is a common knowledge that large scale models are inherently data hungry to achieve better model convergence. This is an important premise that the paper relies on, it would good if authors can follow up to prove/disprove this as additional data points in this paper. The experiments seem to be on same amount of training data on

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Recommender Systems and Techniques · Machine Learning in Healthcare