YOWO-Plus: An Incremental Improvement
Jianhua Yang

TL;DR
This paper presents YOWO-Plus, an improved real-time spatio-temporal action detection method that achieves higher accuracy and efficiency, including a lightweight version called YOWO-Nano, outperforming previous models in speed and accuracy.
Contribution
The paper introduces YOWO-Plus with enhanced detection accuracy and a new lightweight YOWO-Nano model, significantly improving speed and performance over prior YOWO versions.
Findings
YOWO-Plus achieves 84.9% frame mAP on UCF101-24.
YOWO-Nano attains over 90 FPS with 81.0% frame mAP on UCF101-24.
YOWO-Nano is the fastest state-of-the-art action detector.
Abstract
In this technical report, we would like to introduce our updates to YOWO, a real-time method for spatio-temporal action detection. We make a bunch of little design changes to make it better. For network structure, we use the same ones of official implemented YOWO, including 3D-ResNext-101 and YOLOv2, but we use a better pretrained weight of our reimplemented YOLOv2, which is better than the official YOLOv2. We also optimize the label assignment used in YOWO. To accurately detection action instances, we deploy GIoU loss for box regression. After our incremental improvement, YOWO achieves 84.9\% frame mAP and 50.5\% video mAP on the UCF101-24, significantly higher than the official YOWO. On the AVA, our optimized YOWO achieves 20.6\% frame mAP with 16 frames, also exceeding the official YOWO. With 32 frames, our YOWO achieves 21.6 frame mAP with 25 FPS on an RTX 3090 GPU. We name the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications
MethodsAverage Pooling · Global Average Pooling · Batch Normalization · Softmax · 1x1 Convolution · Convolution · Max Pooling · Darknet-19 · YOLOv2
