MixDiff: Mixing Natural and Synthetic Images for Robust Self-Supervised   Representations

Reza Akbarian Bafghi; Nidhin Harilal; Claire Monteleoni; Maziar Raissi

arXiv:2406.12368·cs.CV·December 6, 2024

MixDiff: Mixing Natural and Synthetic Images for Robust Self-Supervised Representations

Reza Akbarian Bafghi, Nidhin Harilal, Claire Monteleoni, Maziar Raissi

PDF

Open Access 1 Repo

TL;DR

MixDiff is a self-supervised learning framework that combines real and synthetic images to improve robustness and domain transfer capabilities of learned representations, achieving significant accuracy gains.

Contribution

It introduces a novel approach to mix real and synthetic images in SSL, enhancing robustness without relying on traditional augmentations.

Findings

01

Boosts SimCLR accuracy by 4.56% on ImageNet-1K

02

Improves robustness across various datasets

03

Achieves competitive performance without augmentations

Abstract

This paper introduces MixDiff, a new self-supervised learning (SSL) pre-training framework that combines real and synthetic images. Unlike traditional SSL methods that predominantly use real images, MixDiff uses a variant of Stable Diffusion to replace an augmented instance of a real image, facilitating the learning of cross real-synthetic image representations. Our key insight is that while models trained solely on synthetic images underperform, combining real and synthetic data leads to more robust and adaptable representations. Experiments show MixDiff enhances SimCLR, BarlowTwins, and DINO across various robustness datasets and domain transfer tasks, boosting SimCLR's ImageNet-1K accuracy by 4.56%. Our framework also demonstrates comparable performance without needing any augmentations, a surprising finding in SSL where augmentations are typically crucial.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cryptonymous9/mixing-ssl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Image Retrieval and Classification Techniques

MethodsBitcoin Customer Service Number +1-833-534-1729 · Linear Layer · Multi-Head Attention · Residual Connection · Softmax · Average Pooling · Layer Normalization · Global Average Pooling · Attention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia?