Generating Samples to Probe Trained Models
Eren Mehmet K{\i}ral, Nur\c{s}en Ayd{\i}n, \c{S}. \.Ilker Birbil

TL;DR
This paper introduces a mathematical framework to probe trained machine learning models by generating samples that reveal their data preferences across different scenarios, enhancing interpretability.
Contribution
The work presents a novel framework for interrogating trained models through generated samples, applicable to classification and regression tasks, to understand model behavior.
Findings
Framework successfully identifies model preferences in various scenarios
Generated samples provide insights into model decision-making
Applicable to a range of models and tasks
Abstract
There is a growing need for investigating how machine learning models operate. With this work, we aim to understand trained machine learning models by questioning their data preferences. We propose a mathematical framework that allows us to probe trained models and identify their preferred samples in various scenarios including prediction-risky, parameter-sensitive, or model-contrastive samples. To showcase our framework, we pose these queries to a range of models trained on a range of classification and regression tasks, and receive answers in the form of generated data.
Peer Reviews
Decision·Submitted to ICLR 2026
The paper introduces a clear variational formulation that symmetrically parallels model training and data synthesis. The proposed framework is very general, can handle various kinds of objective functions, and is model-agnostic.
While the conceptual insight is interesting, at least based on my knowledge in this domain, the paper has relatively marginal novelty in its specific algorithm (which is not necessarily a weakness, though). The paper currently lacks a discussion of how the proposed framework can be extended to the discrete input space. While a VAE decoder can enforce data-manifold constraints, this is still difficult for generating language data. The experiments are mostly qualitative and small-scale. No b
The idea of formulating probing as a generative process is interesting and connects interpretability with probabilistic modeling. Overall, the proposed framework is flexible, allowing different “questions” to be posed to a trained model.
Overall, I find the paper’s organization difficult to follow, particularly in Section 2. The core mathematical framework is presented in a dense and abstract way, which obscures how the proposed method is actually implemented. Several key equations (e.g., Eq. 2–4) are introduced without sufficient intuition or explanation. Moreover, the design rationale behind the probing function G across different use cases remains unclear—for instance, it is not evident why Eq. (5) appropriately captures mode
The paper is original in framing model probing as a data generation problem using a Bayesian Gibbs formulation. It is clearly written, with solid theoretical grounding and well-chosen illustrative experiments. The framework is significant in that it offers a unified and general-purpose approach for probing trained models —both differentiable and non-differentiable— across multiple axes of behavior, including uncertainty, disagreement, sensitivity, and counterfactual exploration. It also has pot
- Evaluation is mostly qualitative relying primarily on visual examples and descriptive comparisons; adding quantitative metrics for uncertainty, sensitivity, disagreement and counterfactual would strengthen the evidence. - The latent-space sampling relies on a pretrained VAE but lacks explicit regularization to prevent drift off the data manifold, which may affect sample fidelity in practice. - Hyperparameter sensitivity is unexplored; the temperature $\tau$ and sampling parameters likely aff
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis · Machine Learning and Data Classification
