Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services

Shaopeng Fu; Xuexue Sun; Ke Qing; Tianhang Zheng; Di Wang

arXiv:2408.02814·cs.LG·May 27, 2025

Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services

Shaopeng Fu, Xuexue Sun, Ke Qing, Tianhang Zheng, Di Wang

PDF

Open Access 1 Repo 4 Reviews

TL;DR

This paper introduces the PEI attack, a novel method to infer which pre-trained encoder is used in downstream ML services, exposing security vulnerabilities and enabling further malicious attacks.

Contribution

The paper reveals a new PEI attack that can identify hidden encoders in downstream services using only API access, highlighting a significant security threat.

Findings

01

PEI attack successfully infers encoders in vision-based downstream services.

02

PEI attack facilitates model stealing and adversarial attacks.

03

Empirical validation on image classification and multimodal generation tasks.

Abstract

Pre-trained encoders available online have been widely adopted to build downstream machine learning (ML) services, but various attacks against these encoders also post security and privacy threats toward such a downstream ML service paradigm. We unveil a new vulnerability: the Pre-trained Encoder Inference (PEI) attack, which can extract sensitive encoder information from a targeted downstream ML service that can then be used to promote other ML attacks against the targeted service. By only providing API accesses to a targeted downstream service and a set of candidate encoders, the PEI attack can successfully infer which encoder is secretly used by the targeted service based on candidate ones. Compared with existing encoder attacks, which mainly target encoders on the upstream side, the PEI attack can compromise encoders even after they have been deployed and hidden in downstream ML…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 6Confidence 3

Strengths

1.The paper proposes a new attack,PEI, that pivots attention from classic upstream pre-trained encoders to the ones actually tucked inside production, downstream ML services. This reframed threat model tracks real deployment practice, fills an obvious hole in prior work, and surfaces a vulnerability that has been hiding in plain sight. 2.The authors validate PEI across image classification and a multimodal generation service (LLaVA). On 18 image-classification targets, it correctly uncovered t

Weaknesses

1、A key fragility is PEI’s sensitivity to input preprocessing and the narrow assumptions behind the proposed bypass. Although the paper notes that PEI fails under common transforms (e.g., JPEG), the suggested defense—enlarge the image before querying—implicitly assumes the server will apply JPEG to the enlarged image before resizing to the model input. If the server instead resizes first then applies JPEG, or applies an unconventional, hard‑to‑predict processing pipeline, the bypass fails. The a

Reviewer 02Rating 4Confidence 4

Strengths

- Novel attacks - Well-written paper - Comprehensive evaluastion

Weaknesses

- Limited Scope of Modalities - Assumption on Candidate Set Availability - Unpractical motivation

Reviewer 03Rating 2Confidence 4

Strengths

1. The experiments are comprehensive and rigorous, with a low false positive rate. The method successfully identifies the correct encoder in 16 out of 18 image classification services. Importantly, it does not produce false positives in the two failure cases, indicating statistical robustness according to the paper’s results. 2. The paper also explores downstream applications of the inferred encoder, such as enhancing model stealing and adversarial attacks, forming a complete and coherent attac

Weaknesses

1. It is unclear why the authors construct adversarial examples in this way. Why do embedding-similar / visually-different samples induce distinguishable behaviors on the downstream task? The explanation provided in the paper is not clear enough and needs further elaboration. 2. The method assumes that the attacker possesses a candidate set E containing the true hidden encoder, or at least models sufficiently similar (e.g., from the same model family). However, it remains unclear whether the ap

Reviewer 04Rating 2Confidence 4

Strengths

1. The paper highlights that encoder privacy can still be compromised in downstream services 2. The proposed method is adapted to various image encoders as demonstrated in experiments

Weaknesses

1. The authors claimed that their method is downstream-task agnostic. However, the proposed algorithm critically relies on continuous input spaces (e.g., pixel-level perturbations) and thus seems tailored to vision encoders. For text encoders with discrete token inputs, the optimization strategy cannot be directly applied, raising doubts about the claimed generality. 2. I noticed that the datasets tested in the article are all low-resolution (e.g., CIFAR-10, SVHN), which may be one of the reaso

Code & Models

Repositories

fshp971/encoder-inference
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Explainable Artificial Intelligence (XAI)

Methodstravel james · Sparse Evolutionary Training