Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

Ryan Ramos; Vladan Stojni\'c; Giorgos Kordopatis-Zilos; Yuta Nakashima; Giorgos Tolias; Noa Garcia

arXiv:2508.10637·cs.CV·April 2, 2026

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

Ryan Ramos, Vladan Stojni\'c, Giorgos Kordopatis-Zilos, Yuta Nakashima, Giorgos Tolias, Noa Garcia

PDF

1 Repo 1 Datasets

TL;DR

This paper investigates how visual encoders like CLIP encode subtle camera and image acquisition parameters, revealing their influence on semantic predictions and the potential for these parameters to be recovered from learned representations.

Contribution

It demonstrates that acquisition and processing parameters are systematically encoded in visual representations and can significantly affect semantic predictions, highlighting a new dimension of interpretability.

Findings

01

Acquisition parameters are systematically encoded in visual representations.

02

These parameters can be recovered from the learned features.

03

Their presence can positively or negatively influence semantic predictions.

Abstract

Prior work has analyzed the robustness of visual encoders to image transformations and corruptions, particularly in cases where such alterations are not seen during training. When this occurs, they introduce a form of distribution shift at test time, often leading to performance degradation. The primary focus has been on severe corruptions that, when applied aggressively, distort useful signals necessary for accurate semantic predictions. We take a different perspective by analyzing parameters of the image acquisition process and transformations that may be subtle or even imperceptible to the human eye. We find that such parameters are systematically encoded in the learned visual representations and can be easily recovered. More strikingly, their presence can have a profound impact, either positively or negatively, on semantic predictions. This effect depends on whether there is a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ryan-caesar-ramos/visual-encoder-traces
github

Datasets

CTU-OU/FlickrExif
dataset· 6 dl
6 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.