Glass Segmentation with Fusion of Learned and General Visual Features

Risto Ojala; Tristan Ellison; Mo Chen

arXiv:2603.03718·cs.CV·March 5, 2026

Glass Segmentation with Fusion of Learned and General Visual Features

Risto Ojala, Tristan Ellison, Mo Chen

PDF

Open Access

TL;DR

This paper introduces a novel glass segmentation architecture combining learned and general visual features, achieving state-of-the-art accuracy and efficiency on multiple datasets.

Contribution

The paper proposes a dual-backbone architecture utilizing a frozen DINOv3 model and a trained Swin model for improved glass segmentation.

Findings

01

Achieved state-of-the-art accuracy on four glass segmentation datasets.

02

The model has competitive inference speed, surpassing previous methods with a lighter backbone.

03

The approach effectively combines learned and general features for transparent object segmentation.

Abstract

Glass surface segmentation from RGB images is a challenging task, since glass as a transparent material distinctly lacks visual characteristics. However, glass segmentation is critical for scene understanding and robotics, as transparent glass surfaces must be identified as solid material. This paper presents a novel architecture for glass segmentation, deploying a dual-backbone producing general visual features as well as task-specific learned visual features. General visual features are produced by a frozen DINOv3 vision foundation model, and the task-specific features are generated with a Swin model trained in a supervised manner. Resulting multi-scale feature representations are downsampled with residual Squeeze-and-Excitation Channel Reduction, and fed into a Mask2Former Decoder, producing the final segmentation masks. The architecture was evaluated on four commonly used glass…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Neural Network Applications · Industrial Vision Systems and Defect Detection