Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation

Buddhi Wijenayake; Nichula Wasalathilake; Roshan Godaliyadda; Vijitha Herath; Parakrama Ekanayake; Vishal M. Patel

arXiv:2602.04749·cs.CV·April 28, 2026

Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation

Buddhi Wijenayake, Nichula Wasalathilake, Roshan Godaliyadda, Vijitha Herath, Parakrama Ekanayake, Vishal M. Patel

PDF

2 Repos 1 Models 1 Datasets

TL;DR

This paper introduces a prompt-controlled diffusion augmentation method that generates targeted synthetic data to address class imbalance and domain shifts in remote-sensing image segmentation.

Contribution

It presents a novel diffusion-based framework for controlled, targeted data augmentation that improves segmentation of minority classes across domains.

Findings

01

Synthetic data improves segmentation accuracy, especially for minority classes.

02

The method enhances performance under domain shift conditions.

03

Controlled augmentation outperforms indiscriminate dataset expansion.

Abstract

Long-tailed class imbalance remains a fundamental obstacle in semantic segmentation of high-resolution remote-sensing imagery, where dominant classes shape learned representations and rare classes are systematically under-segmented. This challenge becomes more acute in cross-domain settings such as LoveDA, which exhibits an explicit Urban/Rural split with substantial appearance differences and inconsistent class-frequency statistics across domains. We propose a prompt-controlled diffusion augmentation framework that generates paired label-image samples with explicit control over semantic composition and domain, enabling targeted enrichment of underrepresented classes rather than indiscriminate dataset expansion. A domain-aware, masked, ratio-conditioned discrete diffusion model first synthesizes layouts that satisfy class-ratio targets while preserving realistic spatial co-occurrence,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
buddhi19/SyntheticGen
model· ♡ 1
♡ 1

Datasets

buddhi19/SyntheticGenV5
dataset· 79 dl
79 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.