Semantic Jitter: Dense Supervision for Visual Comparisons via Synthetic Images
Aron Yu, Kristen Grauman

TL;DR
This paper introduces a method that uses synthetic images generated by advanced image engines to enhance training data for visual comparison tasks, improving attribute ranking in fine-grained visual differences.
Contribution
It presents a novel approach to address data sparsity in visual comparison learning by augmenting real images with synthetically generated ones for better attribute ranking.
Findings
Synthetic augmentation improves ranking accuracy.
Method effective on face and fashion datasets.
Bootstrapping with generated images mitigates sample sparsity.
Abstract
Distinguishing subtle differences in attributes is valuable, yet learning to make visual comparisons remains non-trivial. Not only is the number of possible comparisons quadratic in the number of training images, but also access to images adequately spanning the space of fine-grained visual differences is limited. We propose to overcome the sparsity of supervision problem via synthetically generated images. Building on a state-of-the-art image generation engine, we sample pairs of training images exhibiting slight modifications of individual attributes. Augmenting real training image pairs with these examples, we then train attribute ranking models to predict the relative strength of an attribute in novel pairs of real images. Our results on datasets of faces and fashion images show the great promise of bootstrapping imperfect image generators to counteract sample sparsity for learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
