Multi-Perspective Data Augmentation for Few-shot Object Detection
Anh-Khoa Nguyen Vu, Quoc-Truong Truong, Vinh-Tiep Nguyen, Thanh Duc, Ngo, Thanh-Toan Do, Tam V. Nguyen

TL;DR
This paper introduces a Multi-Perspective Data Augmentation framework for few-shot object detection, enhancing synthetic sample diversity by considering foreground and background relationships, leading to significant performance improvements.
Contribution
The novel MPAD framework combines in-context learning, prompt aggregation, and background sampling to improve synthetic data quality for FSOD.
Findings
Achieves 17.5% nAP50 improvement on PASCAL VOC
Effectively enhances synthetic sample diversity
Outperforms traditional augmentation methods
Abstract
Recent few-shot object detection (FSOD) methods have focused on augmenting synthetic samples for novel classes, show promising results to the rise of diffusion models. However, the diversity of such datasets is often limited in representativeness because they lack awareness of typical and hard samples, especially in the context of foreground and background relationships. To tackle this issue, we propose a Multi-Perspective Data Augmentation (MPAD) framework. In terms of foreground-foreground relationships, we propose in-context learning for object synthesis (ICOS) with bounding box adjustments to enhance the detail and spatial information of synthetic samples. Inspired by the large margin principle, support samples play a vital role in defining class boundaries. Therefore, we design a Harmonic Prompt Aggregation Scheduler (HPAS) to mix prompt embeddings at each time step of the…
Peer Reviews
Decision·ICLR 2025 Poster
1: The MPAD framework leverages Chain-of-Thought Prompting to generate prompts with fine-grained attributes, enabling diverse and representative data synthesis for few-shot object detection. 2: The proposed method is easy to follow and not limited to some specific object detection architectures, making the technique be applied in different scenarios without further modifications. 3: The main results on PASCAL VOC and COCO few-shot datasets illustrate that the proposed method can improve non-
1: The description that "this work is the first ..." in Lines 81-83 is a little bit over-claimed. From my perspective, there is already a large amount of work exploring using large-scale pretrained diffusion models for object detection by generation (like general object detection and corner case generation for autonomous driving). Simply extending the setting to few-shot object detection isn't that significant to me. 2: Section 2.3 (CoT prompting for object synthesis) actually belongs to prompt
+ From the experimental data, the current data augmentation methods show a certain degree of performance improvement. + Integrating various mainstream generative models and zero-shot learning models, including diffusion, CLIP, etc.
- The present approach heavily depends on utilizing pre-trained models to select typical and challenging samples, potentially causing interference when assessing the efficacy of augmentation strategies. - Additional clarification is needed regarding the fairness of the experiments.
1. The CPOS and HPAS introduce novel ways to leverage both typical and hard samples, leading to a more representative synthetic dataset. 2. The use of BAP to generate diverse backgrounds helps enhance detection accuracy by allowing the model to distinguish between foreground and background more effectively. 3. The proposed framework achieves notable gains over state-of-the-art baselines on multiple FSOD benchmarks, particularly in challenging low-shot settings.
1. The framework combines multiple advanced techniques, including diffusion models, harmonic prompt scheduling, and complex background sampling, which may make it challenging for practitioners to implement effectively in real-world scenarios. At the same time, it increases the complexity of the model. Please analyze the model complexity and real-time inference of the generated model. 2. The paper demonstrates performance gains, but it would benefit from a more granular analysis comparing the eff
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Advanced Image and Video Retrieval Techniques · Image and Object Detection Techniques
MethodsDiffusion
