Enrich the content of the image Using Context-Aware Copy Paste
Qiushi Guo

TL;DR
This paper introduces a context-aware copy-paste data augmentation method that leverages advanced models like BLIP, SAM, and YOLO to improve the realism and relevance of augmented images without manual annotation.
Contribution
It presents a novel automated approach integrating BLIP, SAM, and YOLO for context-aware content augmentation, addressing limitations of existing copy-paste techniques.
Findings
Enhanced data diversity across datasets
Improved quality of generated pseudo-images
Effective in multiple computer vision tasks
Abstract
Data augmentation remains a widely utilized technique in deep learning, particularly in tasks such as image classification, semantic segmentation, and object detection. Among them, Copy-Paste is a simple yet effective method and gain great attention recently. However, existing Copy-Paste often overlook contextual relevance between source and target images, resulting in inconsistencies in generated outputs. To address this challenge, we propose a context-aware approach that integrates Bidirectional Latent Information Propagation (BLIP) for content extraction from source images. By matching extracted content information with category information, our method ensures cohesive integration of target objects using Segment Anything Model (SAM) and You Only Look Once (YOLO). This approach eliminates the need for manual annotation, offering an automated and user-friendly solution. Experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsSoftmax · Attention Is All You Need · simple Copy-Paste
