Cross-Architecture Auxiliary Feature Space Translation for Efficient Few-Shot Personalized Object Detection
Francesco Barbato, Umberto Michieli, Jijoong Moon, Pietro Zanuttigh,, Mete Ozay

TL;DR
This paper introduces AuXFT, a novel few-shot personalized object detection method that efficiently adapts to user-specific classes on-device by translating features into an auxiliary space, achieving high performance with low resource usage.
Contribution
The paper proposes a new auxiliary feature space translation approach for few-shot personalized object detection, improving on existing models' personalization and efficiency.
Findings
Achieves 80% of upper bound performance with only 32% inference time
Reduces VRAM usage to 13% and model size to 19% of baseline
Validates effectiveness on multiple datasets and benchmarks
Abstract
Recent years have seen object detection robotic systems deployed in several personal devices (e.g., home robots and appliances). This has highlighted a challenge in their design, i.e., they cannot efficiently update their knowledge to distinguish between general classes and user-specific instances (e.g., a dog vs. user's dog). We refer to this challenging task as Instance-level Personalized Object Detection (IPOD). The personalization task requires many samples for model tuning and optimization in a centralized server, raising privacy concerns. An alternative is provided by approaches based on recent large-scale Foundation Models, but their compute costs preclude on-device applications. In our work we tackle both problems at the same time, designing a Few-Shot IPOD strategy called AuXFT. We introduce a conditional coarse-to-fine few-shot learner to refine the coarse predictions made…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
