Spatial-temporal Concept based Explanation of 3D ConvNets
Ying Ji, Yu Wang, Kensaku Mori, Jien Kato

TL;DR
This paper introduces a novel framework for interpreting 3D ConvNets in video recognition by using supervoxels and voxel importance scores, enabling understanding of spatial-temporal concepts influencing decisions.
Contribution
It proposes the 3D ACE framework that leverages high-level supervoxels and voxel scoring to interpret 3D ConvNets, addressing the gap in explainability for video data.
Findings
Discover spatial-temporal concepts of varying importance levels
Explore influence of concepts on action classification
Framework effectively interprets 3D ConvNet decisions
Abstract
Recent studies have achieved outstanding success in explaining 2D image recognition ConvNets. On the other hand, due to the computation cost and complexity of video data, the explanation of 3D video recognition ConvNets is relatively less studied. In this paper, we present a 3D ACE (Automatic Concept-based Explanation) framework for interpreting 3D ConvNets. In our approach: (1) videos are represented using high-level supervoxels, which is straightforward for human to understand; and (2) the interpreting framework estimates a score for each voxel, which reflects its importance in the decision procedure. Experiments show that our method can discover spatial-temporal concepts of different importance-levels, and thus can explore the influence of the concepts on a target task, such as action classification, in-depth. The codes are publicly available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning in Healthcare
