UNBOX: Unveiling Black-box visual models with Natural-language

Simone Carnemolla; Chiara Russo; Simone Palazzo; Quentin Bouniot; Daniela Giordano; Zeynep Akata; Matteo Pennisi; Concetto Spampinato

arXiv:2603.08639·cs.CV·April 16, 2026

UNBOX: Unveiling Black-box visual models with Natural-language

Simone Carnemolla, Chiara Russo, Simone Palazzo, Quentin Bouniot, Daniela Giordano, Zeynep Akata, Matteo Pennisi, Concetto Spampinato

PDF

TL;DR

UNBOX is a novel framework that uses language and diffusion models to interpret black-box visual models without internal access, revealing learned concepts, biases, and training data characteristics.

Contribution

It introduces a data-free, gradient-free method leveraging LLMs and diffusion models for class-wise model dissection under black-box constraints.

Findings

01

UNBOX produces human-interpretable descriptors for model classes.

02

It performs competitively with white-box interpretability methods.

03

UNBOX reveals model concepts and biases without internal access.

Abstract

Ensuring trustworthiness in open-world visual recognition requires models that are interpretable, fair, and robust to distribution shifts. Yet modern vision systems are increasingly deployed as proprietary black-box APIs, exposing only output probabilities and hiding architecture, parameters, gradients, and training data. This opacity prevents meaningful auditing, bias detection, and failure analysis. Existing explanation methods assume white- or gray-box access or knowledge of the training distribution, making them unusable in these real-world settings. We introduce UNBOX, a framework for class-wise model dissection under fully data-free, gradient-free, and backpropagation-free constraints. UNBOX leverages Large Language Models and text-to-image diffusion models to recast activation maximization as a purely semantic search driven by output probabilities. The method produces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.