Extracting Rules from Neural Networks with Partial Interpretations
Cosimo Persia, Ana Ozaki

TL;DR
This paper presents a method for extracting Horn logic rules from neural networks using partial interpretations and Angluin's algorithm, enabling rule extraction through query-based learning in an abstract setting.
Contribution
It introduces a novel approach combining partial interpretations with Angluin's algorithm for rule extraction from neural networks, advancing interpretability techniques.
Findings
Effective rule extraction demonstrated empirically
Partial interpretations facilitate learning in incomplete knowledge scenarios
Method shows promise for interpretable AI applications
Abstract
We investigate the problem of extracting rules, expressed in Horn logic, from neural network models. Our work is based on the exact learning model, in which a learner interacts with a teacher (the neural network model) via queries in order to learn an abstract target concept, which in our case is a set of Horn rules. We consider partial interpretations to formulate the queries. These can be understood as a representation of the world where part of the knowledge regarding the truthiness of propositions is unknown. We employ Angluin s algorithm for learning Horn rules via queries and evaluate our strategy empirically.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
