GRASP: Guided Residual Adapters with Sample-wise Partitioning
Felix N\"utzel, Mischa Dombrowski, Bernhard Kainz

TL;DR
GRASP introduces a novel partitioning method with residual adapters to improve long-tail class performance in flow-matching transformers, significantly enhancing fidelity, diversity, and tail-class coverage.
Contribution
It proposes a static, sample-wise partitioning approach with residual adapters that improves long-tail class generation without altering the core flow-matching objective.
Findings
Reduces overall FID by up to 80%
Increases tail-class coverage by up to 44%
Outperforms alternative methods in medical and ImageNet-LT datasets
Abstract
Text-to-image flow matching transformers degrade sharply in long-tail settings: tail-class outputs collapse in fidelity and diversity, limiting their value as synthetic augmentation for rare conditions. We trace this to low head-versus-tail gradient alignment during fine-tuning, an optimization-level pathology that conditioning- and sampling-side interventions do not address. We propose GRASP (Guided Residual Adapters with Sample-wise Partitioning): a deterministic partition of the conditioning space, paired with group-specific residual adapters in the transformer feedforward layers, that leaves the flow-matching objective and the sampler untouched. In conditional flow matching, condition values index distinct sets of probability paths, so partitioning along the conditioning is the structurally correct factorization suitable as gradient alignment proxy. Because the partition is static,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
