TL;DR
DIME is a web-based, modality-agnostic tool designed for quick quantitative and qualitative evaluation and comparison of cross-modal retrieval models across different data types like images, text, and videos.
Contribution
The paper introduces DIME, a novel online platform that simplifies the evaluation and comparison of cross-modal retrieval models using a user-friendly interface and flexible data handling.
Findings
Supports multimodal datasets and models
Enables building queryable indexes and feature extraction
Facilitates efficient dataset exploration and search
Abstract
Cross-modal retrieval relies on accurate models to retrieve relevant results for queries across modalities such as image, text, and video. In this paper, we build upon previous work by tackling the difficulty of evaluating models both quantitatively and qualitatively quickly. We present DIME (Dataset, Index, Model, Embedding), a modality-agnostic tool that handles multimodal datasets, trained models, and data preprocessors to support straightforward model comparison with a web browser graphical user interface. DIME inherently supports building modality-agnostic queryable indexes and extraction of relevant feature embeddings, and thus effectively doubles as an efficient cross-modal tool to explore and search through datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
