Extracting PAC Decision Trees from Black Box Binary Classifiers: The Gender Bias Case Study on BERT-based Language Models
Ana Ozaki, Roberto Confalonieri, Ricardo Guimar\~aes, Anders Imenes

TL;DR
This paper introduces a PAC framework-based method to extract decision trees from black box BERT models, providing theoretical fidelity guarantees and revealing occupational gender bias in language models.
Contribution
It adapts a decision tree algorithm to ensure PAC guarantees for model approximation, enabling trustworthy interpretability of complex AI models.
Findings
Extracted decision trees reveal gender bias in BERT models.
PAC guarantees improve trustworthiness of surrogate models.
Method ensures fidelity bounds for model explanations.
Abstract
Decision trees are a popular machine learning method, known for their inherent explainability. In Explainable AI, decision trees can be used as surrogate models for complex black box AI models or as approximations of parts of such models. A key challenge of this approach is determining how accurately the extracted decision tree represents the original model and to what extent it can be trusted as an approximation of their behavior. In this work, we investigate the use of the Probably Approximately Correct (PAC) framework to provide a theoretical guarantee of fidelity for decision trees extracted from AI models. Based on theoretical results from the PAC framework, we adapt a decision tree algorithm to ensure a PAC guarantee under certain conditions. We focus on binary classification and conduct experiments where we extract decision trees from BERT-based language models with PAC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Artificial Intelligence in Law
MethodsFocus
