Full-Network Embedding in a Multimodal Embedding Pipeline
Armand Vilalta, Dario Garcia-Gasulla, Ferran Par\'es, Eduard, Ayguad\'e, Jesus Labarta, Ulises Cort\'es, Toyotaro Suzumura

TL;DR
This paper demonstrates that using Full-Network embeddings, which provide multi-scale image representations, improves performance in image annotation and retrieval tasks over traditional one-layer embeddings, across multiple datasets.
Contribution
The paper introduces the use of Full-Network embeddings in multimodal image-text embedding schemes, showing consistent performance improvements over one-layer embeddings.
Findings
Full-Network embeddings outperform one-layer embeddings in image annotation.
Full-Network embeddings improve image retrieval accuracy.
The approach is flexible and can be integrated into existing multimodal schemes.
Abstract
The current state-of-the-art for image annotation and image retrieval tasks is obtained through deep neural networks, which combine an image representation and a text representation into a shared embedding space. In this paper we evaluate the impact of using the Full-Network embedding in this setting, replacing the original image representation in a competitive multimodal embedding generation scheme. Unlike the one-layer image embeddings typically used by most approaches, the Full-Network embedding provides a multi-scale representation of images, which results in richer characterizations. To measure the influence of the Full-Network embedding, we evaluate its performance on three different datasets, and compare the results with the original multimodal embedding generation scheme when using a one-layer image embedding, and with the rest of the state-of-the-art. Results for image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
