RAG-VisualRec: An Open Resource for Vision- and Text-Enhanced Retrieval-Augmented Generation in Recommendation

Ali Tourani; Fatemeh Nazary; Yashar Deldjoo

arXiv:2506.20817·cs.IR·February 17, 2026

RAG-VisualRec: An Open Resource for Vision- and Text-Enhanced Retrieval-Augmented Generation in Recommendation

Ali Tourani, Fatemeh Nazary, Yashar Deldjoo

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces RAG-VisualRec, a multimodal resource and pipeline that enhances movie recommendation systems by integrating textual, visual, and audio data with LLM-generated descriptions and quality control, improving retrieval and ranking.

Contribution

It presents RAG-VisualRec, an open resource combining multimodal data and LLM-based quality control for improved retrieval-augmented recommendation systems.

Findings

01

CCA fusion improves recall over unimodal baselines

02

LLM-based re-ranking enhances nDCG scores

03

Fusion strategies and retrieval depth significantly impact performance

Abstract

This paper addresses the challenge of building multimodal recommender systems for the movie domain, where sparse item metadata (e.g., title and genres) can limit retrieval quality and downstream recommendations. We introduce RAG-VisualRec, an open resource and reproducible pipeline that combines (i) LLM-generated item-side plot descriptions and (ii) trailer-derived visual (and optional audio) embeddings, supporting both retrieval-augmented generation (RAG) and collaborative-filtering style workflows. Our pipeline augments sparse metadata into richer textual signals and integrates modalities via configurable fusion strategies (e.g., PCA and CCA) before retrieval and optional LLM-based re-ranking. Beyond providing the resource, we provide a complementary analysis that increases transparency and reproducibility. In particular, we introduce LLMGenQC, a critic-based quality-control module…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

recsys-lab/rag-visualrec
noneOfficial

Datasets

alitourani/Popcorn_Dataset
dataset· 58k dl
58k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Recommender Systems and Techniques · Multimodal Machine Learning Applications

MethodsPrincipal Components Analysis