NeuCEPT: Locally Discover Neural Networks' Mechanism via Critical Neurons Identification with Precision Guarantee
Minh N. Vu, Truc D. Nguyen, My T. Thai

TL;DR
NeuCEPT is a method that identifies critical neurons in neural networks to understand their prediction mechanisms, providing a theoretical framework and unsupervised learning approach with guaranteed precision.
Contribution
NeuCEPT introduces a novel mutual-information based formulation and theoretical framework for critical neuron identification with precision guarantees, advancing interpretability of neural networks.
Findings
Identified neurons strongly influence model predictions.
Neurons encode meaningful information about model mechanisms.
Method outperforms baseline approaches in interpretability tasks.
Abstract
Despite recent studies on understanding deep neural networks (DNNs), there exists numerous questions on how DNNs generate their predictions. Especially, given similar predictions on different input samples, are the underlying mechanisms generating those predictions the same? In this work, we propose NeuCEPT, a method to locally discover critical neurons that play a major role in the model's predictions and identify model's mechanisms in generating those predictions. We first formulate a critical neurons identification problem as maximizing a sequence of mutual-information objectives and provide a theoretical framework to efficiently solve for critical neurons while keeping the precision under control. NeuCEPT next heuristically learns different model's mechanisms in an unsupervised manner. Our experimental results show that neurons identified by NeuCEPT not only have strong influence on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification
