UniTriGen: Unified Triplet Generation of Aligned Visible-Infrared-Label for Few-Shot RGB-T Semantic Segmentation
Ping Zhou, Haoyu Wang, Mengmeng Zheng, Lei Zhang, Wei Wei, Chen Ding, Fei Zhou

TL;DR
UniTriGen is a novel framework that generates aligned VIS-IR-Label triplets for RGB-T semantic segmentation, improving data diversity and model performance with limited real data.
Contribution
It introduces a unified triplet generation mechanism with a diffusion process and modality-specific adapters, addressing consistency and bias issues in triplet synthesis.
Findings
Generated triplets are spatially aligned and semantically consistent.
Enhanced segmentation performance across multiple models.
Effective in limited data scenarios with balanced scene and class diversity.
Abstract
RGB-T semantic segmentation requires strictly aligned VIS-IR-Label triplets; however, such aligned triplet data are often scarce in real-world scenarios. Existing generative augmentation methods usually adopt cascaded generation paradigms, decomposing joint triplet generation into local conditional processes. As a result, consistency among VIS, IR, and Label in spatial structure, semantic content, and cross-modal details cannot be reliably maintained. To address this issue, we propose UniTriGen, a unified triplet generation framework that directly generates spatially aligned, semantically consistent, and modality complementary VIS-IR-Label triplets under the guidance of text prompts. UniTriGen first introduces a unified triplet generation mechanism, where VIS, IR, and Label are jointly encoded into a shared latent space and modeled with a diffusion process to enforce global cross-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
