EGSA-PT:Edge-Guided Spatial Attention with Progressive Training for Monocular Depth Estimation and Segmentation of Transparent Objects
Gbenga Omotara, Ramy Farag, Seyed Mohamad Ali Tousi, G.N. DeSouza

TL;DR
This paper introduces EGSA, a boundary-aware fusion mechanism, and a progressive training strategy that together improve depth estimation and segmentation of transparent objects without requiring ground-truth depth during training.
Contribution
The paper proposes a novel edge-guided spatial attention fusion method and a multi-modal progressive training approach for better transparent object perception.
Findings
EGSA improves depth accuracy over state-of-the-art methods.
The progressive training strategy eliminates the need for ground-truth depth.
Significant improvements in transparent regions on Syn-TODD and ClearPose benchmarks.
Abstract
Transparent object perception remains a major challenge in computer vision research, as transparency confounds both depth estimation and semantic segmentation. Recent work has explored multi-task learning frameworks to improve robustness, yet negative cross-task interactions often hinder performance. In this work, we introduce Edge-Guided Spatial Attention (EGSA), a fusion mechanism designed to mitigate destructive interactions by incorporating boundary information into the fusion between semantic and geometric features. On both Syn-TODD and ClearPose benchmarks, EGSA consistently improved depth accuracy over the current state of the art method (MODEST), while preserving competitive segmentation performance, with the largest improvements appearing in transparent regions. Besides our fusion design, our second contribution is a multi-modal progressive training strategy, where learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Vision and Imaging · Visual Attention and Saliency Detection
