Not All Birds Look The Same: Identity-Preserving Generation For Birds
Aaron Sun, Oindrila Saha, Subhransu Maji

TL;DR
This paper introduces a new bird dataset and benchmark to evaluate identity-preserving image generation, highlighting current models' limitations and proposing improvements through species, age, and sex grouping.
Contribution
The creation of the NABLA dataset and benchmark for evaluating identity preservation in bird image generation, addressing a gap in fine-grained, non-rigid object modeling.
Findings
State-of-the-art models fail to preserve identity on the NABLA dataset.
Training with grouped images by species, age, and sex improves identity preservation.
The benchmark enables better evaluation of fine-grained, identity-preserving generation methods.
Abstract
Since the advent of controllable image generation, increasingly rich modes of control have enabled greater customization and accessibility for everyday users. Zero-shot, identity-preserving models such as Insert Anything and OminiControl now support applications like virtual try-on without requiring additional fine-tuning. While these models may be fitting for humans and rigid everyday objects, they still have limitations for non-rigid or fine-grained categories. These domains often lack accessible, high-quality data -- especially videos or multi-view observations of the same subject -- making them difficult both to evaluate and to improve upon. Yet, such domains are essential for moving beyond content creation toward applications that demand accuracy and fine detail. Birds are an excellent domain for this task: they exhibit high diversity, require fine-grained cues for identification,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
