Extracting PAC Decision Trees from Black Box Binary Classifiers: The Gender Bias Case Study on BERT-based Language Models

Ana Ozaki; Roberto Confalonieri; Ricardo Guimar\~aes; Anders Imenes

arXiv:2412.10513·cs.AI·October 8, 2025

Extracting PAC Decision Trees from Black Box Binary Classifiers: The Gender Bias Case Study on BERT-based Language Models

Ana Ozaki, Roberto Confalonieri, Ricardo Guimar\~aes, Anders Imenes

PDF

Open Access

TL;DR

This paper introduces a PAC framework-based method to extract decision trees from black box BERT models, providing theoretical fidelity guarantees and revealing occupational gender bias in language models.

Contribution

It adapts a decision tree algorithm to ensure PAC guarantees for model approximation, enabling trustworthy interpretability of complex AI models.

Findings

01

Extracted decision trees reveal gender bias in BERT models.

02

PAC guarantees improve trustworthiness of surrogate models.

03

Method ensures fidelity bounds for model explanations.

Abstract

Decision trees are a popular machine learning method, known for their inherent explainability. In Explainable AI, decision trees can be used as surrogate models for complex black box AI models or as approximations of parts of such models. A key challenge of this approach is determining how accurately the extracted decision tree represents the original model and to what extent it can be trusted as an approximation of their behavior. In this work, we investigate the use of the Probably Approximately Correct (PAC) framework to provide a theoretical guarantee of fidelity for decision trees extracted from AI models. Based on theoretical results from the PAC framework, we adapt a decision tree algorithm to ensure a PAC guarantee under certain conditions. We focus on binary classification and conduct experiments where we extract decision trees from BERT-based language models with PAC…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Artificial Intelligence in Law

MethodsFocus