TL;DR
This paper introduces a hybrid ensemble decoder and a progressive fine-tuning framework to improve cross-domain few-shot object detection, demonstrating significant performance gains and robustness across multiple datasets.
Contribution
The work proposes a novel ensemble decoder with denoising queries and a plateau-aware fine-tuning schedule, enhancing generalization and stability in FSOD without extra parameters.
Findings
Achieves 41.9 average performance on RF100-VL in 10-shot setting, outperforming recent methods.
Improves robustness to out-of-distribution samples on a mixed-domain test set.
Demonstrates effectiveness across multiple datasets including CD-FSOD, ODinW-13, and RF100-VL.
Abstract
Few-shot object detection (FSOD) is challenging due to unstable optimization and limited generalization arising from the scarcity of training samples. To address these issues, we propose a hybrid ensemble decoder that enhances generalization during fine-tuning. Inspired by ensemble learning, the decoder comprises a shared hierarchical layer followed by multiple parallel decoder branches, where each branch employs denoising queries either inherited from the shared layer or newly initialized to encourage prediction diversity. This design fully exploits pretrained weights without introducing additional parameters, and the resulting diverse predictions can be effectively ensembled to improve generalization. We further leverage a unified progressive fine-tuning framework with a plateau-aware learning rate schedule, which stabilizes optimization and achieves strong few-shot adaptation without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
