Interpreting recurrent neural networks behaviour via excitable network attractors
Andrea Ceni, Peter Ashwin, Lorenzo Livi

TL;DR
This paper introduces a novel method using excitable network attractors to interpret the behavior of recurrent neural networks, enhancing understanding of their decision processes in sequential tasks.
Contribution
It proposes a new mathematical framework and algorithm to extract and interpret network attractors directly from neural network trajectories, improving explainability.
Findings
Attractors effectively interpret RNN behavior in finite state tasks
The method reveals stable states and transitions in RNNs
Simulations validate the approach for understanding neural dynamics
Abstract
Introduction: Machine learning provides fundamental tools both for scientific research and for the development of technologies with significant impact on society. It provides methods that facilitate the discovery of regularities in data and that give predictions without explicit knowledge of the rules governing a system. However, a price is paid for exploiting such flexibility: machine learning methods are typically black-boxes where it is difficult to fully understand what the machine is doing or how it is operating. This poses constraints on the applicability and explainability of such methods. Methods: Our research aims to open the black-box of recurrent neural networks, an important family of neural networks used for processing sequential data. We propose a novel methodology that provides a mechanistic interpretation of behaviour when solving a computational task. Our methodology…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
