Symbolic Rule Extraction from Attention-Guided Sparse Representations in Vision Transformers
Parth Padalkar, Gopal Gupta

TL;DR
This paper introduces a novel method for extracting symbolic, logic-based rules from Vision Transformers by using a sparse concept layer and rule-generation algorithms, improving interpretability and accuracy.
Contribution
It presents the first framework for symbolic rule extraction from ViTs using a sparse concept layer and logic programming, enhancing interpretability and model performance.
Findings
Achieved 5.14% higher accuracy than standard ViT
Generated concise, meaningful logic rules from ViT representations
Enabled direct symbolic reasoning within the ViT architecture
Abstract
Recent neuro-symbolic approaches have successfully extracted symbolic rule-sets from CNN-based models to enhance interpretability. However, applying similar techniques to Vision Transformers (ViTs) remains challenging due to their lack of modular concept detectors and reliance on global self-attention mechanisms. We propose a framework for symbolic rule extraction from ViTs by introducing a sparse concept layer inspired by Sparse Autoencoders (SAEs). This linear layer operates on attention-weighted patch representations and learns a disentangled, binarized representation in which individual neurons activate for high-level visual concepts. To encourage interpretability, we apply a combination of L1 sparsity, entropy minimization, and supervised contrastive loss. These binarized concept activations are used as input to the FOLD-SE-M algorithm, which generates a rule-set in the form of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer
