# DCLTV: An Improved Dual-Condition Diffusion Model for Laser-Visible Image Translation

**Authors:** Xiaoyu Zhang, Laixian Zhang, Huichao Guo, Haijing Zheng, Houpeng Sun, Yingchun Li, Rong Li, Chenglong Luan, Xiaoyun Tong

PMC · DOI: 10.3390/s25030697 · Sensors (Basel, Switzerland) · 2025-01-24

## TL;DR

This paper introduces DCLTV, a new diffusion model that improves laser-to-visible image translation by using dual-condition control and a new dataset.

## Contribution

The novel DCLTV model uses dual-condition control and a new dataset to achieve better cross-modal image translation.

## Key findings

- DCLTV outperformed five baseline models with at least 15.89% lower FID and 22.02% lower LPIPS.
- A dataset of 665 laser-visible image pairs was created to address data scarcity in this domain.
- Ablation experiments confirmed the effectiveness of the dual-condition strategy in DCLTV.

## Abstract

Laser active imaging systems can remedy the shortcomings of visible light imaging systems in difficult imaging circumstances, thereby attaining clear images. However, laser images exhibit significant modal discrepancy in contrast to the visible image, impeding human perception and computer processing. Consequently, it is necessary to translate laser images to visible images across modalities. Existing cross-modal image translation algorithms are plagued with issues, including difficult training and color bleeding. In recent studies, diffusion models have demonstrated superior image generation and translation abilities and been shown to be capable of generating high-quality images. To achieve more accurate laser-visible image translation, we designed an improved diffusion model, called DCLTV, which limits the randomness of diffusion models by means of dual-condition control. We incorporated the Brownian bridge strategy to serve as the first condition control and employed interpolation-based conditional injection to function as the second condition control. We also established a dataset comprising 665 pairs of laser-visible images to compensate for the data deficiency in the field of laser-visible image translation. Compared to five representative baseline models, namely Pix2pix, BigColor, CT2, ColorFormer, and DDColor, the proposed DCLTV achieved the best performance in terms of both qualitative and quantitative comparisons, realizing at least a 15.89% reduction in FID and at least a 22.02% reduction in LPIPS. We further validated the effectiveness of the dual conditions in DCLTV through ablation experiments, achieving the best results with an FID of 154.74 and an LPIPS of 0.379.

## Full-text entities

- **Diseases:** bleeding (MESH:D006470)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11820310/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11820310/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/PMC11820310/full.md

---
Source: https://tomesphere.com/paper/PMC11820310