The Devil is in the Middle: Exploiting Mid-level Representations for Cross-Domain Instance Matching
Qian Yu, Xiaobin Chang, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales

TL;DR
This paper introduces a unified framework that leverages mid-level features from deep neural networks, alongside high-level features, to improve cross-domain instance matching tasks like FG-SBIR and ReID, outperforming more complex models.
Contribution
It demonstrates that extracting and fusing mid-level features from earlier DNN layers enhances cross-domain matching, challenging the reliance on high-level features alone.
Findings
Mid-level features improve matching accuracy.
Simple models outperform complex architectures.
Framework is effective for FG-SBIR and ReID.
Abstract
Many vision problems require matching images of object instances across different domains. These include fine-grained sketch-based image retrieval (FG-SBIR) and Person Re-identification (person ReID). Existing approaches attempt to learn a joint embedding space where images from different domains can be directly compared. In most cases, this space is defined by the output of the final layer of a deep neural network (DNN), which primarily contains features of a high semantic level. In this paper, we argue that both high and mid-level features are relevant for cross-domain instance matching (CDIM). Importantly, mid-level features already exist in earlier layers of the DNN. They just need to be extracted, represented, and fused properly with the final layer. Based on this simple but powerful idea, we propose a unified framework for CDIM. Instantiating our framework for FG-SBIR and ReID, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Human Pose and Action Recognition · Multimodal Machine Learning Applications
