Marginalized Average Attentional Network for Weakly-Supervised Learning
Yuan Yuan, Yueming Lyu, Xi Shen, Ivor W. Tsang, Dit-Yan Yeung

TL;DR
This paper introduces MAAN, a novel network that suppresses dominant salient regions in weakly-supervised temporal action localization, leading to more accurate detection of dense, integral action regions in videos.
Contribution
The paper proposes a marginalized average attentional network with a new MAA module and a fast algorithm, improving the localization of dense action regions in weakly-supervised settings.
Findings
Achieves superior performance on large-scale datasets.
Effectively reduces response differences between salient and non-salient regions.
Provides theoretical proof of response reduction mechanism.
Abstract
In weakly-supervised temporal action localization, previous works have failed to locate dense and integral regions for each entire action due to the overestimation of the most salient regions. To alleviate this issue, we propose a marginalized average attentional network (MAAN) to suppress the dominant response of the most salient regions in a principled manner. The MAAN employs a novel marginalized average aggregation (MAA) module and learns a set of latent discriminative probabilities in an end-to-end fashion. MAA samples multiple subsets from the video snippet features according to a set of latent discriminative probabilities and takes the expectation over all the averaged subset features. Theoretically, we prove that the MAA module with learned latent discriminative probabilities successfully reduces the difference in responses between the most salient regions and the others.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Face and Expression Recognition
