Personalized Representation from Personalized Generation
Shobhita Sundaram, Julia Chae, Yonglong Tian, Sara Beery, Phillip, Isola

TL;DR
This paper investigates how personalized synthetic data generated by diffusion models can enhance fine-grained, data-scarce personalized vision tasks through a contrastive learning approach, improving various downstream applications.
Contribution
It formalizes the challenge of learning personalized representations from synthetic data and introduces a new evaluation suite and contrastive learning method for this purpose.
Findings
Improved personalized representations for recognition and segmentation.
Synthetic data enhances performance in data-scarce personalized tasks.
Analysis of image generation methods critical for effective personalization.
Abstract
Modern vision models excel at general purpose downstream tasks. It is unclear, however, how they may be used for personalized vision tasks, which are both fine-grained and data-scarce. Recent works have successfully applied synthetic data to general-purpose representation learning, while advances in T2I diffusion models have enabled the generation of personalized images from just a few real examples. Here, we explore a potential connection between these ideas, and formalize the challenge of using personalized synthetic data to learn personalized representations, which encode knowledge about an object of interest and may be flexibly applied to any downstream task relating to the target object. We introduce an evaluation suite for this challenge, including reformulations of two existing datasets and a novel dataset explicitly constructed for this purpose, and propose a contrastive…
Peer Reviews
Decision·ICLR 2025 Poster
- The paper was well written and easy to follow. - The overall idea of having a personalized vision backbone that can work well on several downstream tasks is interesting. - The additional experiments on useful synthetic data was insightful.
- Additional baselines or comparisons. - For example, generating additional data with augmentations instead of using T2I models is a cheap baseline. If this is done in real-aug (Tab. 2) and it is comparable to the results of Tab. 1, why does it seem to perform worse than no personalization? - It may also be useful to compare against using real data only to get an upper bound of the method. - It would be interesting to see if, with more real images, cut/paste would outperform Masked D
The paper presents a creative combination of generative models and contrastive learning to address the challenge of instance-specific visual representation with minimal real data. The experimental framework is thorough, covering classification, retrieval, detection, and segmentation tasks across three datasets (DeepFashion2, DogFaceNet, and the newly introduced PODS). The results consistently highlight the advantage of personalized representations over pre-trained ones, demonstrating the robus
1. It looks unnecessary to exclude real negatives in the proposed setting of personalized representation learning. Unlike real positives, real negatives might be easily obtained directly from open-source data. The paper relies on generated negatives produced by the generative model, which can be computationally costly. Alternatively, obtaining real negatives from readily available online sources might be a more efficient solution. Could the authors provide additional insights into why they exclu
1. The paper explores a novel and intriguing problem: learning personalized representations, which could be valuable for downstream tasks related to target objects. The setting is innovative and promising. 2. The proposed method is both simple and effective, as the authors employ an image customization technique to augment the dataset and address the challenge of limited training data. 3. Overall, the paper is well-structured and easy to read.
1. While the problem being studied is intriguing, the overall approach is relatively basic. The concepts of using image generation to augment the dataset and contrastive learning are not novel. 2. Additionally, incorporating image generation could significantly increase the cost of the method. 3. Moreover, in certain tasks, the method results in a decline in performance.
Code & Models
Videos
Taxonomy
TopicsDistributed and Parallel Computing Systems
MethodsDiffusion · Contrastive Learning
