Cross-Architecture Auxiliary Feature Space Translation for Efficient   Few-Shot Personalized Object Detection

Francesco Barbato; Umberto Michieli; Jijoong Moon; Pietro Zanuttigh,; Mete Ozay

arXiv:2407.01193·cs.CV·July 2, 2024

Cross-Architecture Auxiliary Feature Space Translation for Efficient Few-Shot Personalized Object Detection

Francesco Barbato, Umberto Michieli, Jijoong Moon, Pietro Zanuttigh,, Mete Ozay

PDF

Open Access

TL;DR

This paper introduces AuXFT, a novel few-shot personalized object detection method that efficiently adapts to user-specific classes on-device by translating features into an auxiliary space, achieving high performance with low resource usage.

Contribution

The paper proposes a new auxiliary feature space translation approach for few-shot personalized object detection, improving on existing models' personalization and efficiency.

Findings

01

Achieves 80% of upper bound performance with only 32% inference time

02

Reduces VRAM usage to 13% and model size to 19% of baseline

03

Validates effectiveness on multiple datasets and benchmarks

Abstract

Recent years have seen object detection robotic systems deployed in several personal devices (e.g., home robots and appliances). This has highlighted a challenge in their design, i.e., they cannot efficiently update their knowledge to distinguish between general classes and user-specific instances (e.g., a dog vs. user's dog). We refer to this challenging task as Instance-level Personalized Object Detection (IPOD). The personalization task requires many samples for model tuning and optimization in a centralized server, raising privacy concerns. An alternative is provided by approaches based on recent large-scale Foundation Models, but their compute costs preclude on-device applications. In our work we tackle both problems at the same time, designing a Few-Shot IPOD strategy called AuXFT. We introduce a conditional coarse-to-fine few-shot learner to refine the coarse predictions made…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning