DenseBox: Unifying Landmark Localization with End to End Object Detection
Lichao Huang, Yi Yang, Yafeng Deng, Yinan Yu

TL;DR
DenseBox is a unified fully convolutional network that performs end-to-end object detection and landmark localization, achieving state-of-the-art results on challenging datasets like MALF and KITTI.
Contribution
It demonstrates that a single FCN can accurately detect multiple object types and that multi-task learning with landmark localization enhances detection accuracy.
Findings
Achieves state-of-the-art detection accuracy on MALF and KITTI datasets.
Efficiently detects multiple object types with a single FCN.
Multi-task learning with landmark localization improves detection performance.
Abstract
How can a single fully convolutional neural network (FCN) perform on object detection? We introduce DenseBox, a unified end-to-end FCN framework that directly predicts bounding boxes and object class confidences through all locations and scales of an image. Our contribution is two-fold. First, we show that a single FCN, if designed and optimized carefully, can detect multiple different objects extremely accurately and efficiently. Second, we show that when incorporating with landmark localization during multi-task learning, DenseBox further improves object detection accuray. We present experimental results on public benchmark datasets including MALF face detection and KITTI car detection, that indicate our DenseBox is the state-of-the-art system for detecting challenging objects such as faces and cars.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Face recognition and analysis · Domain Adaptation and Few-Shot Learning
