Inner-Scene Similarities as a Contextual Cue for Object Detection
Noa Arbel, Tamar Avraham, Michael Lindenbaum

TL;DR
This paper introduces CISS, a novel method that leverages inner-scene similarity to enhance object detection accuracy by re-scoring candidate regions based on their visual similarity to other regions within the same image.
Contribution
The paper proposes the CISS algorithm, which uses inner-scene similarity as a contextual cue to improve object detection, especially for occluded objects and reducing false positives.
Findings
Improved detection of partly occluded objects.
Reduced false alarms in object detection.
Enhanced scores on PASCAL VOC dataset.
Abstract
Using image context is an effective approach for improving object detection. Previously proposed methods used contextual cues that rely on semantic or spatial information. In this work, we explore a different kind of contextual information: inner-scene similarity. We present the CISS (Context by Inner Scene Similarity) algorithm, which is based on the observation that two visually similar sub-image patches are likely to share semantic identities, especially when both appear in the same image. CISS uses base-scores provided by a base detector and performs as a post-detection stage. For each candidate sub-image (denoted anchor), the CISS algorithm finds a few similar sub-images (denoted supporters), and, using them, calculates a new enhanced score for the anchor. This is done by utilizing the base-scores of the supporters and a pre-trained dependency model. The new scores are modeled as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection · Advanced Vision and Imaging
