# Understanding and evaluating computer vision models through the lens of counterfactuals

**Authors:** Pushkar Shukla

arXiv: 2508.20881 · 2025-08-29

## TL;DR

This paper presents a comprehensive framework using counterfactual reasoning to interpret, audit, and mitigate biases in computer vision classifiers and generative models, promoting fairness and robustness.

## Contribution

It introduces novel counterfactual-based methods for bias detection and mitigation in vision models, unifying interpretability and fairness through scalable, principled approaches.

## Key findings

- CAVLI quantifies reliance on human-interpretable concepts in classifiers.
- ASAC improves fairness by adversarially perturbing protected attributes.
- TIBET and BiasConnect enable causal bias evaluation in generative models.

## Abstract

Counterfactual reasoning -- the practice of asking ``what if'' by varying inputs and observing changes in model behavior -- has become central to interpretable and fair AI. This thesis develops frameworks that use counterfactuals to explain, audit, and mitigate bias in vision classifiers and generative models. By systematically altering semantically meaningful attributes while holding others fixed, these methods uncover spurious correlations, probe causal dependencies, and help build more robust systems.   The first part addresses vision classifiers. CAVLI integrates attribution (LIME) with concept-level analysis (TCAV) to quantify how strongly decisions rely on human-interpretable concepts. With localized heatmaps and a Concept Dependency Score, CAVLI shows when models depend on irrelevant cues like backgrounds. Extending this, ASAC introduces adversarial counterfactuals that perturb protected attributes while preserving semantics. Through curriculum learning, ASAC fine-tunes biased models for improved fairness and accuracy while avoiding stereotype-laden artifacts.   The second part targets generative Text-to-Image (TTI) models. TIBET provides a scalable pipeline for evaluating prompt-sensitive biases by varying identity-related terms, enabling causal auditing of how race, gender, and age affect image generation. To capture interactions, BiasConnect builds causal graphs diagnosing intersectional biases. Finally, InterMit offers a modular, training-free algorithm that mitigates intersectional bias via causal sensitivity scores and user-defined fairness goals.   Together, these contributions show counterfactuals as a unifying lens for interpretability, fairness, and causality in both discriminative and generative models, establishing principled, scalable methods for socially responsible bias evaluation and mitigation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20881/full.md

## Figures

41 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20881/full.md

## References

222 references — full list in the complete paper: https://tomesphere.com/paper/2508.20881/full.md

---
Source: https://tomesphere.com/paper/2508.20881