Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image   Translation

Youngwan Jin; Incheol Park; Hanbin Song; Hyeongjin Ju; Yagiz Nalcakan; and Shiho Kim

arXiv:2409.16706·cs.CV·April 24, 2025

Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation

Youngwan Jin, Incheol Park, Hanbin Song, Hyeongjin Ju, Yagiz Nalcakan, and Shiho Kim

PDF

Open Access 1 Repo

TL;DR

Pix2Next introduces a novel framework that leverages vision foundation models and cross-attention mechanisms to generate high-quality NIR images from RGB inputs, improving realism and utility for computer vision tasks.

Contribution

The paper presents a new RGB-to-NIR translation method using a vision foundation model with cross-attention, multi-scale discriminator, and specialized loss functions, outperforming existing approaches.

Findings

01

34.81% FID score improvement over existing methods

02

Enhanced NIR image quality demonstrated on RANUS dataset

03

Improved downstream object detection performance using generated NIR images

Abstract

This paper proposes Pix2Next, a novel image-to-image translation framework designed to address the challenge of generating high-quality Near-Infrared (NIR) images from RGB inputs. Our approach leverages a state-of-the-art Vision Foundation Model (VFM) within an encoder-decoder architecture, incorporating cross-attention mechanisms to enhance feature integration. This design captures detailed global representations and preserves essential spectral characteristics, treating RGB-to-NIR translation as more than a simple domain transfer problem. A multi-scale PatchGAN discriminator ensures realistic image generation at various detail levels, while carefully designed loss functions couple global context understanding with local feature preservation. We performed experiments on the RANUS dataset to demonstrate Pix2Next's advantages in quantitative metrics and visual quality, improving the FID…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Yonsei-STL/pix2next
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques

MethodsPatchGAN