Improving Object Detection via Local-global Contrastive Learning

Danai Triantafyllidou; Sarah Parisot; Ales Leonardis; Steven McDonagh

arXiv:2410.05058·cs.CV·October 28, 2024

Improving Object Detection via Local-global Contrastive Learning

Danai Triantafyllidou, Sarah Parisot, Ales Leonardis, Steven McDonagh

PDF

Open Access

TL;DR

This paper introduces a contrastive learning-based image translation method that improves cross-domain object detection without requiring object annotations, effectively handling complex scenes with multiple objects under domain shifts.

Contribution

It proposes a novel local-global contrastive learning framework with spatial attention masks for unsupervised domain adaptation in object detection, eliminating the need for object annotations.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Effectively handles scenes with multiple objects under domain shifts.

03

Does not require detector fine-tuning or object annotations.

Abstract

Visual domain gaps often impact object detection performance. Image-to-image translation can mitigate this effect, where contrastive approaches enable learning of the image-to-image mapping under unsupervised regimes. However, existing methods often fail to handle content-rich scenes with multiple object instances, which manifests in unsatisfactory detection performance. Sensitivity to such instance-level content is typically only gained through object annotations, which can be expensive to obtain. Towards addressing this issue, we present a novel image-to-image translation method that specifically targets cross-domain object detection. We formulate our approach as a contrastive learning framework with an inductive prior that optimises the appearance of object instances through spatial attention masks, implicitly delineating the scene into foreground regions associated with the target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Machine Learning and ELM

MethodsSoftmax · Attention Is All You Need · Contrastive Learning