Tech Report: One-stage Lightweight Object Detectors
Deokki Hong

TL;DR
This paper introduces new one-stage lightweight object detectors optimized for both GPU and CPU, achieving a better balance of accuracy and latency on the MS COCO dataset.
Contribution
It proposes novel backbone and feature pyramid network architectures that improve speed and accuracy trade-offs compared to existing models like YOLOX-tiny.
Findings
Proposed GPU-target backbone outperforms YOLOX-tiny by 1.43x in speed.
Achieves 0.5 mAP improvement on MS COCO dataset.
Demonstrates effective trade-offs between parameters, Gflops, latency, and accuracy.
Abstract
This work is for designing one-stage lightweight detectors which perform well in terms of mAP and latency. With baseline models each of which targets on GPU and CPU respectively, various operations are applied instead of the main operations in backbone networks of baseline models. In addition to experiments about backbone networks and operations, several feature pyramid network (FPN) architectures are investigated. Benchmarks and proposed detectors are analyzed in terms of the number of parameters, Gflops, GPU latency, CPU latency and mAP, on MS COCO dataset which is a benchmark dataset in object detection. This work propose similar or better network architectures considering the trade-off between accuracy and latency. For example, our proposed GPU-target backbone network outperforms that of YOLOX-tiny which is selected as the benchmark by 1.43x in speed and 0.5 mAP in accuracy on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Advanced Image and Video Retrieval Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
