PixRec: Leveraging Visual Context for Next-Item Prediction in Sequential Recommendation

Sayak Chakrabarty; Souradip Pal

arXiv:2601.06458·cs.IR·January 13, 2026

PixRec: Leveraging Visual Context for Next-Item Prediction in Sequential Recommendation

Sayak Chakrabarty, Souradip Pal

PDF

Open Access

TL;DR

PixRec introduces a vision-language framework that integrates product images and textual attributes to significantly improve sequential recommendation accuracy in e-commerce, demonstrating the value of visual information.

Contribution

This work presents a novel multi-modal recommendation architecture that jointly processes image and text data, enhancing item differentiation beyond text-only models.

Findings

01

3x improvement in top-rank accuracy

02

40% improvement in top-10 accuracy

03

Effective integration of visual features in recommendation systems

Abstract

Large Language Models (LLMs) have recently shown strong potential for usage in sequential recommendation tasks through text-only models, which combine advanced prompt design, contrastive alignment, and fine-tuning on downstream domain-specific data. While effective, these approaches overlook the rich visual information present in many real-world recommendation scenarios, particularly in e-commerce. This paper proposes PixRec - a vision-language framework that incorporates both textual attributes and product images into the recommendation pipeline. Our architecture leverages a vision-language model backbone capable of jointly processing image-text sequences, maintaining a dual-tower structure and mixed training objective while aligning multi-modal feature projections for both item-item and user-item interactions. Using the Amazon Reviews dataset augmented with product images, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Recommender Systems and Techniques · Explainable Artificial Intelligence (XAI)