Extracting Rules from Neural Networks with Partial Interpretations

Cosimo Persia; Ana Ozaki

arXiv:2204.00360·cs.LG·April 4, 2022

Extracting Rules from Neural Networks with Partial Interpretations

Cosimo Persia, Ana Ozaki

PDF

TL;DR

This paper presents a method for extracting Horn logic rules from neural networks using partial interpretations and Angluin's algorithm, enabling rule extraction through query-based learning in an abstract setting.

Contribution

It introduces a novel approach combining partial interpretations with Angluin's algorithm for rule extraction from neural networks, advancing interpretability techniques.

Findings

01

Effective rule extraction demonstrated empirically

02

Partial interpretations facilitate learning in incomplete knowledge scenarios

03

Method shows promise for interpretable AI applications

Abstract

We investigate the problem of extracting rules, expressed in Horn logic, from neural network models. Our work is based on the exact learning model, in which a learner interacts with a teacher (the neural network model) via queries in order to learn an abstract target concept, which in our case is a set of Horn rules. We consider partial interpretations to formulate the queries. These can be understood as a representation of the world where part of the knowledge regarding the truthiness of propositions is unknown. We employ Angluin s algorithm for learning Horn rules via queries and evaluate our strategy empirically.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.