Substance or Style: What Does Your Image Embedding Know?

Cyrus Rashtchian; Charles Herrmann; Chun-Sung Ferng; Ayan; Chakrabarti; Dilip Krishnan; Deqing Sun; Da-Cheng Juan; Andrew; Tomkins

arXiv:2307.05610·cs.LG·July 13, 2023

Substance or Style: What Does Your Image Embedding Know?

Cyrus Rashtchian, Charles Herrmann, Chun-Sung Ferng, Ayan, Chakrabarti, Dilip Krishnan, Deqing Sun, Da-Cheng Juan, Andrew, Tomkins

PDF

Open Access

TL;DR

This paper systematically investigates the non-semantic information in image embeddings from various models, revealing differences based on training algorithms and their suitability for style and transformation recognition tasks.

Contribution

It introduces a systematic transformation prediction task to analyze non-semantic content in image embeddings across multiple models, highlighting the impact of training methods.

Findings

01

Six embeddings encode non-semantic transformation information.

02

CLIP and ALIGN outperform masking-based models in style transfer recognition.

03

Model training algorithms influence the types of information captured in embeddings.

Abstract

Probes are small networks that predict properties of underlying data from embeddings, and they provide a targeted, effective way to illuminate the information contained in embeddings. While analysis through the use of probes has become standard in NLP, there has been much less exploration in vision. Image foundation models have primarily been evaluated for semantic content. Better understanding the non-semantic information in popular embeddings (e.g., MAE, SimCLR, or CLIP) will shed new light both on the training algorithms and on the uses for these foundation models. We design a systematic transformation prediction task and measure the visual content of embeddings along many axes, including image style, quality, and a range of natural and artificial transformations. Surprisingly, six embeddings (including SimCLR) encode enough non-semantic information to identify dozens of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Topic Modeling · Authorship Attribution and Profiling

MethodsAverage Pooling · Batch Normalization · Global Average Pooling · 1x1 Convolution · Max Pooling · Residual Block · Dense Connections · Residual Connection · Random Resized Crop · *Communicated@Fast*How Do I Communicate to Expedia?