TL;DR
This paper introduces MMSC, a self-supervised multi-modal framework that improves inference of substitutable and complementary items by effectively handling noisy user behavior data and sparse associations.
Contribution
MMSC uniquely combines multi-modal encoding, self-supervised denoising, hierarchical aggregation, and LLM-assisted supervision to enhance item relationship inference.
Findings
MMSC outperforms baselines by 26.1% in substitutable item inference.
MMSC improves complementary item inference accuracy by 39.2%.
The method remains effective for cold-start items.
Abstract
We study the problem of inferring substitutable and complementary items, which underpins applications such as alternative and follow-up purchase suggestions. Existing approaches typically learn from behavior-derived item-item associations using GNNs or leverage item content alone. However, these methods often overlook two key challenges: (i) user behaviors (e.g., co-view/co-purchase) only provide noisy weak supervision, and (ii) behavior signals are long-tailed, leaving many items with sparse associations. We propose MMSC, a self-supervised multi-modal relational representation learning framework that combines a multi-modal foundation model adapted to encode item metadata and a self-supervised denoising module that learns relationship-aware representations from noisy user behaviors, unified by a hierarchical aggregation mechanism. We further use LLM-assisted supervision to mitigate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
