# Recognition Task-Based Detection Score: A Task-Oriented Evaluation Metric for Infrared Image Colorization

**Authors:** Hao Wang, Jiaming Cai, Yao Hu, Chenglong Zhang, Qun Hao

PMC · DOI: 10.3390/s26061807 · Sensors (Basel, Switzerland) · 2026-03-13

## TL;DR

A new evaluation metric for infrared image colorization is proposed, which assesses quality based on object detection performance rather than pixel-level similarity.

## Contribution

The RDS metric introduces task-oriented evaluation with position robustness, interpretability, and adjustability for infrared colorization.

## Key findings

- RDS shows 5.65% improvement in stability under registration errors compared to traditional metrics.
- RDS enables category-level performance diagnosis and flexible evaluation adjustments.
- Flexible category merging improved TIC-CGAN's RDS by 20.4% on unseen scenes.

## Abstract

What are the main findings?
A task-oriented colorization evaluation metric (RDS) is proposed that measures colorization quality through object detection performance, incorporating three key design characteristics: position robustness via IoU-based matching, fine-grained interpretability through category-level analysis, and task adjustability through flexible category partitioning strategies.Experiments demonstrate that RDS not only maintains consistency with traditional metrics under standard conditions but also exhibits superior stability under registration errors (5.65% improvement vs. 11–70% degradation) and uniquely enables category-level performance diagnosis and flexible evaluation dimension adjustment—capabilities that traditional pixel-based metrics lack.

A task-oriented colorization evaluation metric (RDS) is proposed that measures colorization quality through object detection performance, incorporating three key design characteristics: position robustness via IoU-based matching, fine-grained interpretability through category-level analysis, and task adjustability through flexible category partitioning strategies.

Experiments demonstrate that RDS not only maintains consistency with traditional metrics under standard conditions but also exhibits superior stability under registration errors (5.65% improvement vs. 11–70% degradation) and uniquely enables category-level performance diagnosis and flexible evaluation dimension adjustment—capabilities that traditional pixel-based metrics lack.

What are the implications of the main findings?
By providing the evaluation criteria that directly reflect downstream task performance, RDS guides the development of infrared colorization models toward practical applicability rather than pixel-level similarity, driving innovation in model architectures and training strategies for infrared imaging applications.The three design characteristics enable RDS to provide reliable evaluation under imperfect registration conditions, reveal category-specific model weaknesses for targeted improvements, and adapt evaluation focus to match diverse application requirements, supporting more effective model optimization in practical deployment scenarios.

By providing the evaluation criteria that directly reflect downstream task performance, RDS guides the development of infrared colorization models toward practical applicability rather than pixel-level similarity, driving innovation in model architectures and training strategies for infrared imaging applications.

The three design characteristics enable RDS to provide reliable evaluation under imperfect registration conditions, reveal category-specific model weaknesses for targeted improvements, and adapt evaluation focus to match diverse application requirements, supporting more effective model optimization in practical deployment scenarios.

Infrared image colorization has gained widespread attention in recent years as an important means of enhancing image visibility and semantic expression. However, existing evaluation methods mostly rely on pixel-level differences or feature distribution distances, failing to comprehensively reflect the usability of colorization results in practical tasks. To address this, we propose a task-oriented colorization quality evaluation metric called Recognition-Task based Detection Score (RDS), which uses the recognition accuracy of object detection models on colorized images as a proxy indicator to measure their actual performance in downstream tasks, thereby achieving consistency between image quality assessment and task performance. RDS incorporates three key characteristics in its design: enhancing position robustness through the matching mechanism of object detection tasks, providing fine-grained interpretability through category-level accuracy calculation, and achieving task adjustability through flexible category division strategies. Systematic experiments conducted on both NIR–RGB and FLIR-5C datasets demonstrate that RDS maintains good subjective–objective consistency with traditional metrics under standard registration conditions, exhibits superior stability under registration error scenarios, and possesses fine-grained interpretability and task adjustability that traditional metrics lack. RDS maintains a 5.7% improvement in discriminative Score Gap under misalignment while PSNR degrades by 69.8%, and flexible category merging raises TIC-CGAN’s RDS from 76.05% to 96.45% on unseen scenes, providing more practically valuable criteria for the evaluation and optimization of infrared colorization models.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13030208/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13030208/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC13030208/full.md

---
Source: https://tomesphere.com/paper/PMC13030208