
TL;DR
This paper reviews methods for identifying concepts within DNNs, discusses how conceptual spaces are shaped by accuracy and compression tradeoffs, and critically examines the limitations in current identification techniques.
Contribution
It provides a critical analysis of existing methods for concept identification in DNNs and explores the philosophical implications of conceptual space formation.
Findings
DNNs can represent non-trivial inferential relations between concepts
Current methods have severe limitations in accurately identifying concepts
Conceptual spaces are influenced by a tradeoff between accuracy and compression
Abstract
The present paper reviews and discusses work from computer science that proposes to identify concepts in internal representations (hidden layers) of DNNs. It is examined, first, how existing methods actually identify concepts that are supposedly represented in DNNs. Second, it is discussed how conceptual spaces -- sets of concepts in internal representations -- are shaped by a tradeoff between predictive accuracy and compression. These issues are critically examined by drawing on philosophy. While there is evidence that DNNs able to represent non-trivial inferential relations between concepts, our ability to identify concepts is severely limited.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)
