Multimodal Information Retrieval for Open World with Edit Distance Weak Supervision
KMA Solaiman, Bharat Bhargava

TL;DR
FemmIR is a novel multimodal retrieval framework that leverages edit distance-based weak supervision to retrieve relevant data without requiring similarity labels or extensive fine-tuning, demonstrated on a new MuQNOL dataset.
Contribution
The paper introduces FemmIR, a weakly supervised multimodal retrieval method that reuses pretrained encoders and employs edit distance for relevance measurement, avoiding schema mapping and annotation overhead.
Findings
FemmIR achieves comparable performance to existing systems in a missing person retrieval task.
The approach effectively utilizes high-level property constraints and implicit signals from edit distances.
FemmIR operates without the need for similarity labels or extensive fine-tuning.
Abstract
Existing multi-media retrieval models either rely on creating a common subspace with modality-specific representation models or require schema mapping among modalities to measure similarities among multi-media data. Our goal is to avoid the annotation overhead incurred from considering retrieval as a supervised classification task and re-use the pretrained encoders in large language models and vision tasks. We propose "FemmIR", a framework to retrieve multimodal results relevant to information needs expressed with multimodal queries by example without any similarity label. Such identification is necessary for real-world applications where data annotations are scarce and satisfactory performance is required without fine-tuning with a common framework across applications. We curate a new dataset called MuQNOL for benchmarking progress on this task. Our technique is based on weak…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
