Efficient Multi-Modal Embeddings from Structured Data
Anita L. Ver\H{o}, Ann Copestake

TL;DR
This paper introduces a new multi-modal embedding that combines structured visual data with linguistic information, improving semantic understanding efficiently without relying on direct visual input.
Contribution
The work presents a novel embedding type based on structured visual annotations that enhances text embeddings with visual context in a resource-efficient manner.
Findings
The new embedding provides complementary information to text-based embeddings.
It achieves comparable performance to visual models with significantly less resources.
Structured visual data improves semantic similarity tasks without direct visual input.
Abstract
Multi-modal word semantics aims to enhance embeddings with perceptual input, assuming that human meaning representation is grounded in sensory experience. Most research focuses on evaluation involving direct visual input, however, visual grounding can contribute to linguistic applications as well. Another motivation for this paper is the growing need for more interpretable models and for evaluating model efficiency regarding size and performance. This work explores the impact of visual information for semantics when the evaluation involves no direct visual input, specifically semantic similarity and relatedness. We investigate a new embedding type in-between linguistic and visual modalities, based on the structured annotations of Visual Genome. We compare uni- and multi-modal models including structured, linguistic and image based representations. We measure the efficiency of each model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
