# A Novel Framework for Automated Explain Vision Model Using Vision-Language Models

**Authors:** Phu-Vinh Nguyen, Tan-Hanh Pham, Chris Ngo, Truong Son Hy

arXiv: 2508.20227 · 2025-08-29

## TL;DR

This paper introduces a new framework leveraging vision-language models to automatically explain vision models' behavior at both individual and dataset levels, enhancing interpretability and bias detection.

## Contribution

It presents a novel pipeline that combines vision-language models with xAI to provide comprehensive explanations of vision models' general behavior and failure cases.

## Key findings

- Enables explanation of vision models at sample and dataset levels
- Facilitates discovery of failure cases and bias detection
- Integrates xAI with vision model development for better insights

## Abstract

The development of many vision models mainly focuses on improving their performance using metrics such as accuracy, IoU, and mAP, with less attention to explainability due to the complexity of applying xAI methods to provide a meaningful explanation of trained models. Although many existing xAI methods aim to explain vision models sample-by-sample, methods explaining the general behavior of vision models, which can only be captured after running on a large dataset, are still underexplored. Furthermore, understanding the behavior of vision models on general images can be very important to prevent biased judgments and help identify the model's trends and patterns. With the application of Vision-Language Models, this paper proposes a pipeline to explain vision models at both the sample and dataset levels. The proposed pipeline can be used to discover failure cases and gain insights into vision models with minimal effort, thereby integrating vision model development with xAI analysis to advance image analysis.

---
Source: https://tomesphere.com/paper/2508.20227