Technical Report for ICRA 2025 GOOSE 2D Semantic Segmentation Challenge: Leveraging Color Shift Correction, RoPE-Swin Backbone, and Quantile-based Label Denoising Strategy for Robust Outdoor Scene Understanding

Chih-Chung Hsu; I-Hsuan Wu; Wen-Hai Tseng; Ching-Heng Cheng; Ming-Hsuan Wu; Jin-Hui Jiang; Yu-Jou Hsiao

arXiv:2505.06991·cs.CV·May 13, 2025

Technical Report for ICRA 2025 GOOSE 2D Semantic Segmentation Challenge: Leveraging Color Shift Correction, RoPE-Swin Backbone, and Quantile-based Label Denoising Strategy for Robust Outdoor Scene Understanding

Chih-Chung Hsu, I-Hsuan Wu, Wen-Hai Tseng, Ching-Heng Cheng, Ming-Hsuan Wu, Jin-Hui Jiang, Yu-Jou Hsiao

PDF

Open Access

TL;DR

This paper introduces a robust outdoor scene semantic segmentation framework for ICRA 2025 GOOSE challenge, combining a RoPE-enhanced Swin Transformer, color shift correction, and quantile-based label denoising, achieving high accuracy.

Contribution

The novel integration of RoPE-Swin backbone, color shift correction, and quantile-based denoising improves robustness in outdoor scene segmentation under real-world conditions.

Findings

01

Achieved a mean IoU of 0.848 on the GOOSE test set.

02

Demonstrated effectiveness of color correction and denoising strategies.

03

Enhanced spatial generalization with RoPE embeddings.

Abstract

This report presents our semantic segmentation framework developed by team ACVLAB for the ICRA 2025 GOOSE 2D Semantic Segmentation Challenge, which focuses on parsing outdoor scenes into nine semantic categories under real-world conditions. Our method integrates a Swin Transformer backbone enhanced with Rotary Position Embedding (RoPE) for improved spatial generalization, alongside a Color Shift Estimation-and-Correction module designed to compensate for illumination inconsistencies in natural environments. To further improve training stability, we adopt a quantile-based denoising strategy that downweights the top 2.5\% of highest-error pixels, treating them as noise and suppressing their influence during optimization. Evaluated on the official GOOSE test set, our approach achieved a mean Intersection over Union (mIoU) of 0.848, demonstrating the effectiveness of combining color…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Image Enhancement Techniques · Generative Adversarial Networks and Image Synthesis

MethodsAttention Is All You Need · Linear Layer · Stochastic Depth · Multi-Head Attention · Dense Connections · Swin Transformer · ADaptive gradient method with the OPTimal convergence rate · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer