Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation

Gautier Evennou; Antoine Chaffin; Vivien Chappelier; Ewa Kijak

arXiv:2412.15939·cs.CV·October 15, 2025

Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation

Gautier Evennou, Antoine Chaffin, Vivien Chappelier, Ewa Kijak

PDF

1 Repo

TL;DR

This paper introduces BLIP2IDC, an efficient adaptation of BLIP2 for image difference captioning, and proposes synthetic augmentation to enhance dataset quality and model performance on real-world images.

Contribution

It presents a novel low-cost adaptation method for IDC using BLIP2 and introduces synthetic augmentation to improve IDC datasets and models.

Findings

01

BLIP2IDC outperforms two-stream approaches on real-world IDC datasets.

02

Synthetic augmentation creates high-quality data for IDC.

03

The new Syned1 dataset is well-suited for IDC tasks.

Abstract

The rise of the generative models quality during the past years enabled the generation of edited variations of images at an important scale. To counter the harmful effects of such technology, the Image Difference Captioning (IDC) task aims to describe the differences between two images. While this task is successfully handled for simple 3D rendered images, it struggles on real-world images. The reason is twofold: the training data-scarcity, and the difficulty to capture fine-grained differences between complex images. To address those issues, we propose in this paper a simple yet effective framework to both adapt existing image captioning models to the IDC task and augment IDC datasets. We introduce BLIP2IDC, an adaptation of BLIP2 to the IDC task at low computational cost, and show it outperforms two-streams approaches by a significant margin on real-world IDC datasets. We also propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gautierevn/blip2idc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.