A Model of Understanding in Deep Learning Systems
David Peter Wallis Freeborn

TL;DR
This paper introduces a model of systematic understanding for machine learning systems, emphasizing internal models, stable principles, and reliable prediction, and discusses how deep learning systems partially achieve this but fall short of scientific understanding.
Contribution
It proposes a formal model of understanding in ML systems and analyzes how deep learning systems align or misalign with this model.
Findings
Deep learning systems can achieve some aspects of understanding.
Current systems often lack symbolic alignment with targets.
Deep systems are only weakly unifying and not fully reductive.
Abstract
I propose a model of systematic understanding, suitable for machine learning systems. On this account, an agent understands a property of a target system when it contains an adequate internal model that tracks real regularities, is coupled to the target by stable bridge principles, and supports reliable prediction. I argue that contemporary deep learning systems often can and do achieve such understanding. However they generally fall short of the ideal of scientific understanding: the understanding is symbolically misaligned with the target system, not explicitly reductive, and only weakly unifying. I label this the Fractured Understanding Hypothesis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
