Recent Trends in 2D Object Detection and Applications in Video Event Recognition
Prithwish Jana, Partha Pratim Mohanta

TL;DR
This paper reviews recent advances in 2D object detection, emphasizing deep learning methods and their applications in video event recognition, along with datasets and performance comparisons.
Contribution
It provides a comprehensive overview of geometry-based and deep learning approaches, highlighting their integration in video event recognition and summarizing recent datasets and benchmarks.
Findings
Deep learning methods outperform traditional geometry-based techniques.
Unified architectures predict class and bounding boxes in a single pipeline.
Two-stage architectures improve detection accuracy in complex scenes.
Abstract
Object detection serves as a significant step in improving performance of complex downstream computer vision tasks. It has been extensively studied for many years now and current state-of-the-art 2D object detection techniques proffer superlative results even in complex images. In this chapter, we discuss the geometry-based pioneering works in object detection, followed by the recent breakthroughs that employ deep learning. Some of these use a monolithic architecture that takes a RGB image as input and passes it to a feed-forward ConvNet or vision Transformer. These methods, thereby predict class-probability and bounding-box coordinates, all in a single unified pipeline. Two-stage architectures on the other hand, first generate region proposals and then feed it to a CNN to extract features and predict object category and bounding-box. We also elaborate upon the applications of object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Advanced Image and Video Retrieval Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Absolute Position Encodings · Softmax
