A Convolutional Vision Transformer for Semantic Segmentation of   Side-Scan Sonar Data

Hayat Rajani; Nuno Gracias; Rafael Garcia

arXiv:2302.12416·cs.CV·September 8, 2023

A Convolutional Vision Transformer for Semantic Segmentation of Side-Scan Sonar Data

Hayat Rajani, Nuno Gracias, Rafael Garcia

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel convolutional vision transformer architecture tailored for semantic segmentation of side-scan sonar data, achieving state-of-the-art results in marine seafloor mapping with real-time performance.

Contribution

The work presents a new ViT-based encoder-decoder model with specialized modules for low-data regimes and multiscale features, optimized for seabed habitat classification.

Findings

01

Achieved state-of-the-art segmentation accuracy.

02

Demonstrated real-time processing capability.

03

Effective in low-data scenarios.

Abstract

Distinguishing among different marine benthic habitat characteristics is of key importance in a wide set of seabed operations ranging from installations of oil rigs to laying networks of cables and monitoring the impact of humans on marine ecosystems. The Side-Scan Sonar (SSS) is a widely used imaging sensor in this regard. It produces high-resolution seafloor maps by logging the intensities of sound waves reflected back from the seafloor. In this work, we leverage these acoustic intensity maps to produce pixel-wise categorization of different seafloor types. We propose a novel architecture adapted from the Vision Transformer (ViT) in an encoder-decoder framework. Further, in doing so, the applicability of ViTs is evaluated on smaller datasets. To overcome the lack of CNN-like inductive biases, thereby making ViTs more conducive to applications in low data regimes, we propose a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hayatrajani/s3seg-vit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsUnderwater Acoustics Research · Underwater Vehicles and Communication Systems · Seismic Imaging and Inversion Techniques

MethodsAttention Is All You Need · Label Smoothing · Absolute Position Encodings · Adam · Layer Normalization · Residual Connection · Dense Connections · Linear Layer · Dropout · Byte Pair Encoding