CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016
Yuanjun Xiong, Limin Wang, Zhe Wang, Bowen Zhang, Hang Song, Wei Li,, Dahua Lin, Yu Qiao, Luc Van Gool, Xiaoou Tang

TL;DR
This paper describes a high-performing video classification method for ActivityNet Challenge 2016, combining advanced deep models, novel aggregation techniques, and audio features, resulting in first place with 93.23% mAP.
Contribution
The paper introduces new aggregation schemes and integrates audio features into the temporal segment network framework for improved untrimmed video classification.
Findings
Achieved 93.23% mAP on the test set
Secured first place in ActivityNet Challenge 2016
Enhanced performance with ensemble of deep models
Abstract
This paper presents the method that underlies our submission to the untrimmed video classification task of ActivityNet Challenge 2016. We follow the basic pipeline of temporal segment networks and further raise the performance via a number of other techniques. Specifically, we use the latest deep model architecture, e.g., ResNet and Inception V3, and introduce new aggregation schemes (top-k and attention-weighted pooling). Additionally, we incorporate the audio as a complementary channel, extracting relevant information via a CNN applied to the spectrograms. With these techniques, we derive an ensemble of deep models, which, together, attains a high classification accuracy (mAP ) on the testing set and secured the first place in the challenge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Multimodal Machine Learning Applications
MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection
