Black Box to White Box: Discover Model Characteristics Based on Strategic Probing
Josh Kalin, Matthew Ciolino, David Noever, Gerry Dozier

TL;DR
This paper presents a method to infer model architecture and training data from model outputs using strategic probing, applicable to image classifiers and GPT-2 based text transformers, revealing distinguishable characteristics.
Contribution
It introduces a structured probing approach to classify model architecture and dataset origin, advancing understanding of model interpretability and attribution in machine learning.
Findings
Image classifiers' architectures and datasets are distinguishable via probing.
GPT-2 text transformers show dataset-specific output diversity, but architecture attribution remains challenging.
The method enables model attribute inference without internal access.
Abstract
In Machine Learning, White Box Adversarial Attacks rely on knowing underlying knowledge about the model attributes. This works focuses on discovering to distrinct pieces of model information: the underlying architecture and primary training dataset. With the process in this paper, a structured set of input probes and the output of the model become the training data for a deep classifier. Two subdomains in Machine Learning are explored: image based classifiers and text transformers with GPT-2. With image classification, the focus is on exploring commonly deployed architectures and datasets available in popular public libraries. Using a single transformer architecture with multiple levels of parameters, text generation is explored by fine tuning off different datasets. Each dataset explored in image and text are distinguishable from one another. Diversity in text transformer outputs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Cosine Annealing · Layer Normalization · Weight Decay · Dropout · Dense Connections · Linear Warmup With Cosine Annealing · Attention Dropout · Attention Is All You Need · Byte Pair Encoding
