Mine-JEPA: In-Domain Self-Supervised Learning for Mine-Like Object Classification in Side-Scan Sonar

Taeyoun Kwon; Youngwon Choi; Hyeonyu Kim; Myeongkyun Cho; Junhyeok Choi; Moon Hwan Kim

arXiv:2604.00383·cs.CV·April 2, 2026

Mine-JEPA: In-Domain Self-Supervised Learning for Mine-Like Object Classification in Side-Scan Sonar

Taeyoun Kwon, Youngwon Choi, Hyeonyu Kim, Myeongkyun Cho, Junhyeok Choi, Moon Hwan Kim

PDF

TL;DR

Mine-JEPA introduces an in-domain self-supervised learning approach for mine classification in side-scan sonar images, outperforming larger foundation models and using fewer parameters in data-scarce scenarios.

Contribution

The paper presents Mine-JEPA, the first in-domain SSL pipeline for SSS mine classification, demonstrating superior performance with limited unlabeled data.

Findings

01

Mine-JEPA achieves an F1 score of 0.935 in binary classification.

02

It outperforms fine-tuned DINOv3 despite using fewer parameters.

03

Applying in-domain SSL to foundation models can degrade performance.

Abstract

Side-scan sonar (SSS) mine classification is a challenging maritime vision problem characterized by extreme data scarcity and a large domain gap from natural images. While self-supervised learning (SSL) and general-purpose vision foundation models have shown strong performance in general vision and several specialized domains, their use in SSS remains largely unexplored. We present Mine-JEPA, the first in-domain SSL pipeline for SSS mine classification, using SIGReg, a regularization-based SSL loss, to pretrain on only 1,170 unlabeled sonar images. In the binary mine vs. non-mine setting, Mine-JEPA achieves an F1 score of 0.935, outperforming fine-tuned DINOv3 (0.922), a foundation model pretrained on 1.7B images. For 3-class mine-like object classification, Mine-JEPA reaches 0.820 with synthetic data augmentation, again outperforming fine-tuned DINOv3 (0.810). We further observe that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.