UniTriGen: Unified Triplet Generation of Aligned Visible-Infrared-Label for Few-Shot RGB-T Semantic Segmentation

Ping Zhou; Haoyu Wang; Mengmeng Zheng; Lei Zhang; Wei Wei; Chen Ding; Fei Zhou

arXiv:2605.14626·cs.CV·May 15, 2026

UniTriGen: Unified Triplet Generation of Aligned Visible-Infrared-Label for Few-Shot RGB-T Semantic Segmentation

Ping Zhou, Haoyu Wang, Mengmeng Zheng, Lei Zhang, Wei Wei, Chen Ding, Fei Zhou

PDF

TL;DR

UniTriGen is a novel framework that generates aligned VIS-IR-Label triplets for RGB-T semantic segmentation, improving data diversity and model performance with limited real data.

Contribution

It introduces a unified triplet generation mechanism with a diffusion process and modality-specific adapters, addressing consistency and bias issues in triplet synthesis.

Findings

01

Generated triplets are spatially aligned and semantically consistent.

02

Enhanced segmentation performance across multiple models.

03

Effective in limited data scenarios with balanced scene and class diversity.

Abstract

RGB-T semantic segmentation requires strictly aligned VIS-IR-Label triplets; however, such aligned triplet data are often scarce in real-world scenarios. Existing generative augmentation methods usually adopt cascaded generation paradigms, decomposing joint triplet generation into local conditional processes. As a result, consistency among VIS, IR, and Label in spatial structure, semantic content, and cross-modal details cannot be reliably maintained. To address this issue, we propose UniTriGen, a unified triplet generation framework that directly generates spatially aligned, semantically consistent, and modality complementary VIS-IR-Label triplets under the guidance of text prompts. UniTriGen first introduces a unified triplet generation mechanism, where VIS, IR, and Label are jointly encoded into a shared latent space and modeled with a diffusion process to enforce global cross-modal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.