Large-scale Benchmarks for Multimodal Recommendation with Ducho

Matteo Attimonelli; Danilo Danese; Angela Di Fazio; Daniele Malitesta; Claudio Pomo; Tommaso Di Noia

arXiv:2409.15857·cs.IR·February 24, 2026

Large-scale Benchmarks for Multimodal Recommendation with Ducho

Matteo Attimonelli, Danilo Danese, Angela Di Fazio, Daniele Malitesta, Claudio Pomo, Tommaso Di Noia

PDF

1 Repo

TL;DR

This paper introduces the first large-scale benchmark for multimodal recommender systems focusing on feature extractors, providing insights into training and tuning multimodal recommendation algorithms across various domains and modalities.

Contribution

It offers a unified experimental environment for benchmarking multimodal feature extractors in recommendation systems, filling a gap in systematic evaluation procedures.

Findings

01

Different extractors significantly impact recommendation performance.

02

Hyper-parameter tuning is crucial for optimal results.

03

Multimodal features improve recommendations across multiple domains.

Abstract

The common multimodal recommendation pipeline involves (i) extracting multimodal features, (ii) refining their high-level representations to suit the recommendation task, (iii) optionally fusing all multimodal features, and (iv) predicting the user-item score. Although great effort has been put into designing optimal solutions for (ii-iv), to the best of our knowledge, very little attention has been devoted to exploring procedures for (i) in a rigorous way. In this respect, the existing literature outlines the large availability of multimodal datasets and the ever-growing number of large models accounting for multimodal-aware tasks, but (at the same time) an unjustified adoption of limited standardized solutions. As very recent works from the literature have begun to conduct empirical studies to assess the contribution of multimodality in recommendation, we decide to follow and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sisinflab/Ducho-meets-Elliot
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Focus