A Convolutional Vision Transformer for Semantic Segmentation of Side-Scan Sonar Data
Hayat Rajani, Nuno Gracias, Rafael Garcia

TL;DR
This paper introduces a novel convolutional vision transformer architecture tailored for semantic segmentation of side-scan sonar data, achieving state-of-the-art results in marine seafloor mapping with real-time performance.
Contribution
The work presents a new ViT-based encoder-decoder model with specialized modules for low-data regimes and multiscale features, optimized for seabed habitat classification.
Findings
Achieved state-of-the-art segmentation accuracy.
Demonstrated real-time processing capability.
Effective in low-data scenarios.
Abstract
Distinguishing among different marine benthic habitat characteristics is of key importance in a wide set of seabed operations ranging from installations of oil rigs to laying networks of cables and monitoring the impact of humans on marine ecosystems. The Side-Scan Sonar (SSS) is a widely used imaging sensor in this regard. It produces high-resolution seafloor maps by logging the intensities of sound waves reflected back from the seafloor. In this work, we leverage these acoustic intensity maps to produce pixel-wise categorization of different seafloor types. We propose a novel architecture adapted from the Vision Transformer (ViT) in an encoder-decoder framework. Further, in doing so, the applicability of ViTs is evaluated on smaller datasets. To overcome the lack of CNN-like inductive biases, thereby making ViTs more conducive to applications in low data regimes, we propose a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUnderwater Acoustics Research · Underwater Vehicles and Communication Systems · Seismic Imaging and Inversion Techniques
MethodsAttention Is All You Need · Label Smoothing · Absolute Position Encodings · Adam · Layer Normalization · Residual Connection · Dense Connections · Linear Layer · Dropout · Byte Pair Encoding
