# A Convolutional Vision Transformer for Semantic Segmentation of   Side-Scan Sonar Data

**Authors:** Hayat Rajani, Nuno Gracias, Rafael Garcia

arXiv: 2302.12416 · 2023-09-08

## TL;DR

This paper introduces a novel convolutional vision transformer architecture tailored for semantic segmentation of side-scan sonar data, achieving state-of-the-art results in marine seafloor mapping with real-time performance.

## Contribution

The work presents a new ViT-based encoder-decoder model with specialized modules for low-data regimes and multiscale features, optimized for seabed habitat classification.

## Key findings

- Achieved state-of-the-art segmentation accuracy.
- Demonstrated real-time processing capability.
- Effective in low-data scenarios.

## Abstract

Distinguishing among different marine benthic habitat characteristics is of key importance in a wide set of seabed operations ranging from installations of oil rigs to laying networks of cables and monitoring the impact of humans on marine ecosystems. The Side-Scan Sonar (SSS) is a widely used imaging sensor in this regard. It produces high-resolution seafloor maps by logging the intensities of sound waves reflected back from the seafloor. In this work, we leverage these acoustic intensity maps to produce pixel-wise categorization of different seafloor types. We propose a novel architecture adapted from the Vision Transformer (ViT) in an encoder-decoder framework. Further, in doing so, the applicability of ViTs is evaluated on smaller datasets. To overcome the lack of CNN-like inductive biases, thereby making ViTs more conducive to applications in low data regimes, we propose a novel feature extraction module to replace the Multi-layer Perceptron (MLP) block within transformer layers and a novel module to extract multiscale patch embeddings. A lightweight decoder is also proposed to complement this design in order to further boost multiscale feature extraction. With the modified architecture, we achieve state-of-the-art results and also meet real-time computational requirements. We make our code available at ~\url{https://github.com/hayatrajani/s3seg-vit

---
Source: https://tomesphere.com/paper/2302.12416