Bringing the Context Back into Object Recognition, Robustly

Klara Janouskova; Cristian Gavrus; Jiri Matas

arXiv:2411.15933·cs.CV·March 12, 2025

Bringing the Context Back into Object Recognition, Robustly

Klara Janouskova, Cristian Gavrus, Jiri Matas

PDF

Open Access

TL;DR

This paper introduces L2R2, a new method that combines localization and recognition to utilize context effectively while maintaining robustness against distribution shifts and background biases.

Contribution

L2R2 is a novel approach that integrates zero-shot detection for localization with recognition, enhancing robustness and leveraging context without over-reliance on backgrounds.

Findings

01

L2R2 improves recognition accuracy across multiple datasets.

02

The method maintains robustness under distribution shifts.

03

Localization enhances recognition performance in diverse scenarios.

Abstract

In object recognition, both the subject of interest (referred to as foreground, FG, for simplicity) and its surrounding context (background, BG) may play an important role. However, standard supervised learning often leads to unintended over-reliance on the BG, limiting model robustness in real-world deployment settings. The problem is mainly addressed by suppressing the BG, sacrificing context information for improved generalization. We propose "Localize to Recognize Robustly" (L2R2), a novel recognition approach which exploits the benefits of context-aware classification while maintaining robustness to distribution shifts. L2R2 leverages advances in zero-shot detection to localize the FG before recognition. It improves the performance of both standard recognition with supervised training, as well as multimodal zero-shot recognition with VLMs, while being robust to long-tail BGs and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition