DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring

Anju Rani; Daniel Ortiz-Arroyo; Petar Durdevic

arXiv:2605.04943·cs.CV·May 7, 2026

DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring

Anju Rani, Daniel Ortiz-Arroyo, Petar Durdevic

PDF

TL;DR

DART is a comprehensive vision-language model for rope condition monitoring that provides damage detection, severity assessment, and automated reporting from a single image using a unified multi-task architecture.

Contribution

The paper introduces DART, a novel multi-task foundation model that integrates vision and language for full-spectrum rope condition inspection without task-specific fine-tuning.

Findings

01

DART achieves 93.22% accuracy in damage classification.

02

It attains a Spearman rho of 0.94 in severity regression.

03

DART performs well in few-shot damage recognition with 89.2% macro-F1 at 20 shots.

Abstract

The condition monitoring (CM) of synthetic fibre ropes (SFRs) used in offshore, maritime, and industrial settings demands more than a classifier: inspectors need continuous severity estimates, maintenance recommendations, anomaly flags, deterioration timelines, and automated reports, all from a single inspection image. We present DART (Damage Assessment via Rope Transformer), a vision-language foundation model that addresses the full rope inspection workflow through a unified multi-task architecture. DART extends the Joint-Embedding Predictive Architecture (JEPA) to the cross-modal domain by coupling a Vision Transformer (ViT-H/14) with Llama-3.2-3B-Instruct via a Severity-Conditioned Cross-Modal Fusion (SC-CMF) module. Three architectural innovations drive the model's versatility: (1) HD-MASK, a saliency-guided masking strategy that focuses self-supervised reconstruction on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.