Implicit Ray-Transformers for Multi-view Remote Sensing Image   Segmentation

Zipeng Qi; Hao Chen; Chenyang Liu; Zhenwei Shi; Zhengxia Zou

arXiv:2303.08401·cs.CV·August 9, 2023·1 cites

Implicit Ray-Transformers for Multi-view Remote Sensing Image Segmentation

Zipeng Qi, Hao Chen, Chenyang Liu, Zhenwei Shi, Zhengxia Zou

PDF

Open Access

TL;DR

This paper introduces Implicit Ray-Transformers, a novel approach combining implicit neural representations and multi-view 3D priors for accurate remote sensing image segmentation with limited labels.

Contribution

It proposes a two-stage learning framework that integrates 3D scene encoding with a Ray Transformer to improve multi-view remote sensing segmentation.

Findings

01

Outperforms CNN-based methods in accuracy.

02

Effective with sparse labels (4-6 per 100 images).

03

Works well on synthetic and real datasets.

Abstract

The mainstream CNN-based remote sensing (RS) image semantic segmentation approaches typically rely on massive labeled training data. Such a paradigm struggles with the problem of RS multi-view scene segmentation with limited labeled views due to the lack of considering 3D information within the scene. In this paper, we propose ''Implicit Ray-Transformer (IRT)'' based on Implicit Neural Representation (INR), for RS scene semantic segmentation with sparse labels (such as 4-6 labels per 100 images). We explore a new way of introducing multi-view 3D structure priors to the task for accurate and view-consistent semantic segmentation. The proposed method includes a two-stage learning process. In the first stage, we optimize a neural field to encode the color and 3D structure of the remote sensing scene based on multi-view images. In the second stage, we design a Ray Transformer to leverage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote Sensing and LiDAR Applications · Advanced Vision and Imaging · Video Surveillance and Tracking Methods

MethodsAttention Is All You Need · Entropy Regularization · Proximal Policy Optimization · Linear Layer · Adam · Multi-Head Attention · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections