TL;DR
This paper introduces a perturbation-based video attribution method that accounts for spatiotemporal dependencies and network diversity, enhanced by a regularization term for smoothness, validated through objective metrics and evaluations.
Contribution
It proposes a generic, diversified video attribution method with a novel regularization for smoothness and introduces objective metrics for evaluation.
Findings
Effective attribution results demonstrated through subjective evaluation.
Objective metrics show improved reliability over existing methods.
Method is compatible with various video understanding network architectures.
Abstract
The attribution method provides a direction for interpreting opaque neural networks in a visual way by identifying and visualizing the input regions/pixels that dominate the output of a network. Regarding the attribution method for visually explaining video understanding networks, it is challenging because of the unique spatiotemporal dependencies existing in video inputs and the special 3D convolutional or recurrent structures of video understanding networks. However, most existing attribution methods focus on explaining networks taking a single image as input and a few works specifically devised for video attribution come short of dealing with diversified structures of video understanding networks. In this paper, we investigate a generic perturbation-based attribution method that is compatible with diversified video understanding networks. Besides, we propose a novel regularization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
