Evaluating Context for Deep Object Detectors
Osman Semih Kayhan, Jan C. van Gemert

TL;DR
This paper systematically evaluates how different deep object detectors utilize scene context, revealing that their performance depends on the application context and the extent of background and foreground information used.
Contribution
It categorizes deep object detectors based on context usage and provides a controlled dataset to analyze the impact of context on detection performance.
Findings
Single-stage and two-stage detectors leverage large receptive fields for context.
Context significantly influences detector performance depending on application.
Removing background and foreground affects detection accuracy.
Abstract
Which object detector is suitable for your context sensitive task? Deep object detectors exploit scene context for recognition differently. In this paper, we group object detectors into 3 categories in terms of context use: no context by cropping the input (RCNN), partial context by cropping the featuremap (two-stage methods) and full context without any cropping (single-stage methods). We systematically evaluate the effect of context for each deep detector category. We create a fully controlled dataset for varying context and investigate the context for deep detectors. We also evaluate gradually removing the background context and the foreground object on MS COCO. We demonstrate that single-stage and two-stage object detectors can and will use the context by virtue of their large receptive field. Thus, choosing the best object detector may depend on the application context.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
