APEX: Probing Neural Networks via Activation Perturbation
Tao Ren, Xiaoyu Luo, Qiongxiu Li

TL;DR
APEX introduces a novel activation perturbation method at inference time to probe neural network representations, revealing structural information and biases not accessible through traditional input or parameter analysis.
Contribution
This paper presents APEX, a new inference-time probing paradigm that perturbs hidden activations to explore neural network structure, surpassing limitations of prior input-space and parameter perturbation methods.
Findings
APEX effectively measures sample regularity and model biases.
Distinguishes structured from random models using activation perturbation.
Reveals training-induced biases like class concentration in backdoored models.
Abstract
Prior work on probing neural networks primarily relies on input-space analysis or parameter perturbation, both of which face fundamental limitations in accessing structural information encoded in intermediate representations. We introduce Activation Perturbation for EXploration (APEX), an inference-time probing paradigm that perturbs hidden activations while keeping both inputs and model parameters fixed. We theoretically show that activation perturbation induces a principled transition from sample-dependent to model-dependent behavior by suppressing input-specific signals and amplifying representation-level structure, and further establish that input perturbation corresponds to a constrained special case of this framework. Through representative case studies, we demonstrate the practical advantages of APEX. In the small-noise regime, APEX provides a lightweight and efficient measure of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)
