The Interpretable Dictionary in Sparse Coding
Edward Kim, Connor Onweller, Andrew O'Brien, Kathleen McCoy

TL;DR
This paper demonstrates that neural networks trained with sparse coding under certain constraints produce more interpretable internal representations than standard deep learning models, enhancing understanding of learned features.
Contribution
It introduces a sparse coding approach that improves interpretability of neural network internal representations compared to traditional models.
Findings
Sparse coding yields more interpretable dictionaries.
Enhanced understanding of feature activations.
Quantitative and qualitative interpretability benefits.
Abstract
Artificial neural networks (ANNs), specifically deep learning networks, have often been labeled as black boxes due to the fact that the internal representation of the data is not easily interpretable. In our work, we illustrate that an ANN, trained using sparse coding under specific sparsity constraints, yields a more interpretable model than the standard deep learning model. The dictionary learned by sparse coding can be more easily understood and the activations of these elements creates a selective feature output. We compare and contrast our sparse coding model with an equivalent feed forward convolutional autoencoder trained on the same data. Our results show both qualitative and quantitative benefits in the interpretation of the learned sparse coding dictionary as well as the internal activation representations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Advanced Data Compression Techniques · Neural Networks and Applications
MethodsSolana Customer Service Number +1-833-534-1729
