Generating Samples to Probe Trained Models

Eren Mehmet K{\i}ral; Nur\c{s}en Ayd{\i}n; \c{S}. \.Ilker Birbil

arXiv:2502.06658·cs.LG·December 22, 2025

Generating Samples to Probe Trained Models

Eren Mehmet K{\i}ral, Nur\c{s}en Ayd{\i}n, \c{S}. \.Ilker Birbil

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a mathematical framework to probe trained machine learning models by generating samples that reveal their data preferences across different scenarios, enhancing interpretability.

Contribution

The work presents a novel framework for interrogating trained models through generated samples, applicable to classification and regression tasks, to understand model behavior.

Findings

01

Framework successfully identifies model preferences in various scenarios

02

Generated samples provide insights into model decision-making

03

Applicable to a range of models and tasks

Abstract

There is a growing need for investigating how machine learning models operate. With this work, we aim to understand trained machine learning models by questioning their data preferences. We propose a mathematical framework that allows us to probe trained models and identify their preferred samples in various scenarios including prediction-risky, parameter-sensitive, or model-contrastive samples. To showcase our framework, we pose these queries to a range of models trained on a range of classification and regression tasks, and receive answers in the form of generated data.

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

The paper introduces a clear variational formulation that symmetrically parallels model training and data synthesis. The proposed framework is very general, can handle various kinds of objective functions, and is model-agnostic.

Weaknesses

While the conceptual insight is interesting, at least based on my knowledge in this domain, the paper has relatively marginal novelty in its specific algorithm (which is not necessarily a weakness, though). The paper currently lacks a discussion of how the proposed framework can be extended to the discrete input space. While a VAE decoder can enforce data-manifold constraints, this is still difficult for generating language data. The experiments are mostly qualitative and small-scale. No b

Reviewer 02Rating 4Confidence 1

Strengths

The idea of formulating probing as a generative process is interesting and connects interpretability with probabilistic modeling. Overall, the proposed framework is flexible, allowing different “questions” to be posed to a trained model.

Weaknesses

Overall, I find the paper’s organization difficult to follow, particularly in Section 2. The core mathematical framework is presented in a dense and abstract way, which obscures how the proposed method is actually implemented. Several key equations (e.g., Eq. 2–4) are introduced without sufficient intuition or explanation. Moreover, the design rationale behind the probing function G across different use cases remains unclear—for instance, it is not evident why Eq. (5) appropriately captures mode

Reviewer 03Rating 6Confidence 3

Strengths

The paper is original in framing model probing as a data generation problem using a Bayesian Gibbs formulation. It is clearly written, with solid theoretical grounding and well-chosen illustrative experiments. The framework is significant in that it offers a unified and general-purpose approach for probing trained models —both differentiable and non-differentiable— across multiple axes of behavior, including uncertainty, disagreement, sensitivity, and counterfactual exploration. It also has pot

Weaknesses

- Evaluation is mostly qualitative relying primarily on visual examples and descriptive comparisons; adding quantitative metrics for uncertainty, sensitivity, disagreement and counterfactual would strengthen the evidence. - The latent-space sampling relies on a pretrained VAE but lacks explicit regularization to prevent drift off the data manifold, which may affect sample fidelity in practice. - Hyperparameter sensitivity is unexplored; the temperature $\tau$ and sampling parameters likely aff

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis · Machine Learning and Data Classification