Interpretability-Guided Bi-objective Optimization: Aligning Accuracy and Explainability
Kasra Fouladi, Hamta Rahmani

TL;DR
This paper presents IGBO, a bi-objective optimization framework that balances accuracy and interpretability by encoding feature hierarchies and using advanced attribution methods, with theoretical guarantees.
Contribution
It introduces a novel interpretability-guided bi-objective optimization method with DAG encoding, convergence proofs, and statistical guarantees for model interpretability.
Findings
Proposes a DAG-based feature importance hierarchy with statistical guarantees.
Develops a geometric projection for combining task and interpretability gradients.
Provides convergence proof to Pareto-stationary points.
Abstract
This paper introduces Interpretability-Guided Bi-objective Optimization (IGBO), a framework that trains interpretable models by incorporating structured domain knowledge via a bi-objective formulation. IGBO encodes feature importance hierarchies as a Directed Acyclic Graph (DAG) via Central Limit Theorem-based construction and uses Temporal Integrated Gradients (TIG) to measure feature importance. The framework employs a novel Relative Importance Score Hk(X, {\theta}) that quantifies the normalized cumulative attribution of each feature over time. We propose a geometric projection mapping P for combining task and interpretability gradients, and prove convergence to Pareto-stationary points. To address the Out-of-Distribution problem in TIG computation, we outline an Optimal Path Oracle architecture, which we leave for future work. Central Limit Theorem-based construction of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
