TL;DR
This paper introduces a guided learning approach with an added concept layer in CNNs to improve interpretability by aligning learned features with human-perceived concepts without sacrificing accuracy.
Contribution
It proposes a novel concept-guided training method that enhances interpretability and transferability of CNN features while maintaining high prediction accuracy.
Findings
Learned concepts align with human perception.
Concept transferability to new classes.
Maintains prediction accuracy with interpretability.
Abstract
Learning concepts that are consistent with human perception is important for Deep Neural Networks to win end-user trust. Post-hoc interpretation methods lack transparency in the feature representations learned by the models. This work proposes a guided learning approach with an additional concept layer in a CNN- based architecture to learn the associations between visual features and word phrases. We design an objective function that optimizes both prediction accuracy and semantics of the learned feature representations. Experiment results demonstrate that the proposed model can learn concepts that are consistent with human perception and their corresponding contributions to the model decision without compromising accuracy. Further, these learned concepts are transferable to new classes of objects that have similar concepts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
