Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
Daniele Malitesta, Giuseppe Gassi, Claudio Pomo, Tommaso Di Noia

TL;DR
Ducho is a unified, configurable framework that streamlines the extraction of multimodal features for recommendation systems by integrating multiple deep learning libraries and providing a shared, easy-to-use interface.
Contribution
It introduces a flexible, multi-backend framework with a YAML configuration system, enabling standardized and efficient multimodal feature extraction in recommendation tasks.
Findings
Provides a shared interface for TensorFlow, PyTorch, and Transformers
Enables configurable feature extraction via YAML files
Includes a Docker image with CUDA support and demo scenarios
Abstract
In multimodal-aware recommendation, the extraction of meaningful multimodal features is at the basis of high-quality recommendations. Generally, each recommendation framework implements its multimodal extraction procedures with specific strategies and tools. This is limiting for two reasons: (i) different extraction strategies do not ease the interdependence among multimodal recommendation frameworks; thus, they cannot be efficiently and fairly compared; (ii) given the large plethora of pre-trained deep learning models made available by different open source tools, model designers do not have access to shared interfaces to extract features. Motivated by the outlined aspects, we propose \framework, a unified framework for the extraction of multimodal features in recommendation. By integrating three widely-adopted deep learning libraries as backends, namely, TensorFlow, PyTorch, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Topic Modeling · Multimodal Machine Learning Applications
