# SC-CoSF: Self-Correcting Collaborative and Co-Training for Image Fusion and Semantic Segmentation

**Authors:** Dongrui Yang, Lihong Qiao, Yucheng Shu

PMC · DOI: 10.3390/s25123575 · Sensors (Basel, Switzerland) · 2025-06-06

## TL;DR

This paper introduces SC-CoSF, a new framework that improves image fusion and semantic segmentation by combining them in a shared learning process.

## Contribution

The novel SC-CoSF framework uses a self-correcting collaborative module and joint training to enhance both image fusion and segmentation.

## Key findings

- SC-CoSF outperforms independent baselines in image fusion quality and segmentation accuracy.
- The proposed modules preserve edge textures and color contrasts while reducing feature redundancy.
- End-to-end training with shared parameters improves inter-task consistency and performance.

## Abstract

Multimodal image fusion and semantic segmentation play pivotal roles in autonomous driving and robotic systems, yet their inherent interdependence remains underexplored. To address this gap and overcome performance bottlenecks, we propose SC-CoSF, a novel coupled framework that jointly optimizes these tasks through synergistic learning. Our approach replaces traditional duplex encoders with a weight-sharing CNN encoder, implicitly aligning multimodal features while reducing parameter overhead. The core innovation lies in our Self-correction and Collaboration Fusion Module (Sc-CFM), which integrates (1) a Self-correction Long-Range Relationship Branch (Sc-LRB) to strengthen global semantic modeling, (2) a Self-correction Fine-Grained Branch (Sc-FGB) for enhanced visual detail retention through local feature aggregation, and (3) a Dual-branch Collaborative Recalibration (DCR) mechanism for cross-task feature refinement. This design preserves critical edge textures and color contrasts for segmentation while leveraging segmentation-derived spatial priors to guide fusion. We further introduce the Interactive Context Recovery Mamba Decoder (ICRM) to restore lost long-range dependencies during the upsampling process; meanwhile, we propose the Region Adaptive Weighted Reconstruction Decoder (ReAW), which is mainly used to reduce feature redundancy in image fusion tasks. End-to-end joint training enables gradient propagation across all task branches via shared parameters, exploiting inter-task consistency for superior performance. Experiments demonstrate significant improvements over independently optimized baselines in both fusion quality and segmentation accuracy.

## Full-text entities

- **Diseases:** ICRM (MESH:D055191), injury to (MESH:D014947)
- **Chemicals:** Sc (MESH:D012538), DCR (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12196656/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12196656/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/PMC12196656/full.md

---
Source: https://tomesphere.com/paper/PMC12196656