Implicit Ray-Transformers for Multi-view Remote Sensing Image Segmentation
Zipeng Qi, Hao Chen, Chenyang Liu, Zhenwei Shi, Zhengxia Zou

TL;DR
This paper introduces Implicit Ray-Transformers, a novel approach combining implicit neural representations and multi-view 3D priors for accurate remote sensing image segmentation with limited labels.
Contribution
It proposes a two-stage learning framework that integrates 3D scene encoding with a Ray Transformer to improve multi-view remote sensing segmentation.
Findings
Outperforms CNN-based methods in accuracy.
Effective with sparse labels (4-6 per 100 images).
Works well on synthetic and real datasets.
Abstract
The mainstream CNN-based remote sensing (RS) image semantic segmentation approaches typically rely on massive labeled training data. Such a paradigm struggles with the problem of RS multi-view scene segmentation with limited labeled views due to the lack of considering 3D information within the scene. In this paper, we propose ''Implicit Ray-Transformer (IRT)'' based on Implicit Neural Representation (INR), for RS scene semantic segmentation with sparse labels (such as 4-6 labels per 100 images). We explore a new way of introducing multi-view 3D structure priors to the task for accurate and view-consistent semantic segmentation. The proposed method includes a two-stage learning process. In the first stage, we optimize a neural field to encode the color and 3D structure of the remote sensing scene based on multi-view images. In the second stage, we design a Ray Transformer to leverage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing and LiDAR Applications · Advanced Vision and Imaging · Video Surveillance and Tracking Methods
MethodsAttention Is All You Need · Entropy Regularization · Proximal Policy Optimization · Linear Layer · Adam · Multi-Head Attention · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections
