TL;DR
Conditional Similarity Networks (CSNs) enable learning interpretable, multi-faceted image embeddings by disentangling features into semantic subspaces, outperforming specialized models in capturing diverse similarity notions.
Contribution
The paper introduces CSNs, a novel approach that learns disentangled, multi-faceted image embeddings with semantic subspaces and masks, capturing multiple similarity notions in a single model.
Findings
CSNs learn interpretable, semantically meaningful subspaces.
CSNs outperform specialized networks on multiple similarity tasks.
Embeddings are visually and semantically disentangled.
Abstract
What makes images similar? To measure the similarity between images, they are typically embedded in a feature-vector space, in which their distance preserve the relative dissimilarity. However, when learning such similarity embeddings the simplifying assumption is commonly made that images are only compared to one unique measure of similarity. A main reason for this is that contradicting notions of similarities cannot be captured in a single space. To address this shortcoming, we propose Conditional Similarity Networks (CSNs) that learn embeddings differentiated into semantically distinct subspaces that capture the different notions of similarities. CSNs jointly learn a disentangled embedding where features for different similarities are encoded in separate dimensions as well as masks that select and reweight relevant dimensions to induce a subspace that encodes a specific similarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
