Bringing the Context Back into Object Recognition, Robustly
Klara Janouskova, Cristian Gavrus, Jiri Matas

TL;DR
This paper introduces L2R2, a new method that combines localization and recognition to utilize context effectively while maintaining robustness against distribution shifts and background biases.
Contribution
L2R2 is a novel approach that integrates zero-shot detection for localization with recognition, enhancing robustness and leveraging context without over-reliance on backgrounds.
Findings
L2R2 improves recognition accuracy across multiple datasets.
The method maintains robustness under distribution shifts.
Localization enhances recognition performance in diverse scenarios.
Abstract
In object recognition, both the subject of interest (referred to as foreground, FG, for simplicity) and its surrounding context (background, BG) may play an important role. However, standard supervised learning often leads to unintended over-reliance on the BG, limiting model robustness in real-world deployment settings. The problem is mainly addressed by suppressing the BG, sacrificing context information for improved generalization. We propose "Localize to Recognize Robustly" (L2R2), a novel recognition approach which exploits the benefits of context-aware classification while maintaining robustness to distribution shifts. L2R2 leverages advances in zero-shot detection to localize the FG before recognition. It improves the performance of both standard recognition with supervised training, as well as multimodal zero-shot recognition with VLMs, while being robust to long-tail BGs and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition
