Learning Horn Envelopes via Queries from Large Language Models
Sophie Blum, Raoul Koudijs, Ana Ozaki, Samia Touileb

TL;DR
This paper presents a method to extract approximate Horn theories from neural networks using adapted query-based learning algorithms, demonstrated on language models to reveal biases.
Contribution
It introduces a new algorithm for learning the tightest Horn approximation from neural networks, extending classical learning models to neural network analysis.
Findings
Successfully extracted occupation-based gender bias rules from language models.
The algorithm terminates in polynomial time for targets with polynomially many non-Horn examples.
Demonstrated the approach's applicability on pre-trained language models.
Abstract
We investigate an approach for extracting knowledge from trained neural networks based on Angluin's exact learning model with membership and equivalence queries to an oracle. In this approach, the oracle is a trained neural network. We consider Angluin's classical algorithm for learning Horn theories and study the necessary changes to make it applicable to learn from neural networks. In particular, we have to consider that trained neural networks may not behave as Horn oracles, meaning that their underlying target theory may not be Horn. We propose a new algorithm that aims at extracting the "tightest Horn approximation" of the target theory and that is guaranteed to terminate in exponential time (in the worst case) and in polynomial time if the target has polynomially many non-Horn examples. To showcase the applicability of the approach, we perform experiments on pre-trained language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Algorithms · Natural Language Processing Techniques
