RAG-VisualRec: An Open Resource for Vision- and Text-Enhanced Retrieval-Augmented Generation in Recommendation
Ali Tourani, Fatemeh Nazary, Yashar Deldjoo

TL;DR
This paper introduces RAG-VisualRec, a multimodal resource and pipeline that enhances movie recommendation systems by integrating textual, visual, and audio data with LLM-generated descriptions and quality control, improving retrieval and ranking.
Contribution
It presents RAG-VisualRec, an open resource combining multimodal data and LLM-based quality control for improved retrieval-augmented recommendation systems.
Findings
CCA fusion improves recall over unimodal baselines
LLM-based re-ranking enhances nDCG scores
Fusion strategies and retrieval depth significantly impact performance
Abstract
This paper addresses the challenge of building multimodal recommender systems for the movie domain, where sparse item metadata (e.g., title and genres) can limit retrieval quality and downstream recommendations. We introduce RAG-VisualRec, an open resource and reproducible pipeline that combines (i) LLM-generated item-side plot descriptions and (ii) trailer-derived visual (and optional audio) embeddings, supporting both retrieval-augmented generation (RAG) and collaborative-filtering style workflows. Our pipeline augments sparse metadata into richer textual signals and integrates modalities via configurable fusion strategies (e.g., PCA and CCA) before retrieval and optional LLM-based re-ranking. Beyond providing the resource, we provide a complementary analysis that increases transparency and reproducibility. In particular, we introduce LLMGenQC, a critic-based quality-control module…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Recommender Systems and Techniques · Multimodal Machine Learning Applications
MethodsPrincipal Components Analysis
