TL;DR
This paper introduces BLIP2IDC, an efficient adaptation of BLIP2 for image difference captioning, and proposes synthetic augmentation to enhance dataset quality and model performance on real-world images.
Contribution
It presents a novel low-cost adaptation method for IDC using BLIP2 and introduces synthetic augmentation to improve IDC datasets and models.
Findings
BLIP2IDC outperforms two-stream approaches on real-world IDC datasets.
Synthetic augmentation creates high-quality data for IDC.
The new Syned1 dataset is well-suited for IDC tasks.
Abstract
The rise of the generative models quality during the past years enabled the generation of edited variations of images at an important scale. To counter the harmful effects of such technology, the Image Difference Captioning (IDC) task aims to describe the differences between two images. While this task is successfully handled for simple 3D rendered images, it struggles on real-world images. The reason is twofold: the training data-scarcity, and the difficulty to capture fine-grained differences between complex images. To address those issues, we propose in this paper a simple yet effective framework to both adapt existing image captioning models to the IDC task and augment IDC datasets. We introduce BLIP2IDC, an adaptation of BLIP2 to the IDC task at low computational cost, and show it outperforms two-streams approaches by a significant margin on real-world IDC datasets. We also propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
