Deep Transformer Network for Monocular Pose Estimation of Shipborne Unmanned Aerial Vehicle

Maneesha Wickramasuriya; Taeyoung Lee; Murray Snyder

arXiv:2406.09260·cs.CV·February 3, 2026

Deep Transformer Network for Monocular Pose Estimation of Shipborne Unmanned Aerial Vehicle

Maneesha Wickramasuriya, Taeyoung Lee, Murray Snyder

PDF

Open Access 1 Repo

TL;DR

This paper presents a deep transformer network that estimates the 6D pose of UAVs relative to ships from monocular images, enabling improved autonomous landing and navigation in maritime environments.

Contribution

It introduces a novel transformer-based approach trained on synthetic data for accurate pose estimation of UAVs near ships, with integration of Bayesian fusion for robustness.

Findings

01

Position estimation error of approximately 0.8% on synthetic data

02

Position estimation error of approximately 1.0% during flight experiments

03

Demonstrates robustness across various lighting conditions

Abstract

This paper introduces a deep transformer network for estimating the relative 6D pose of a Unmanned Aerial Vehicle (UAV) with respect to a ship using monocular images. A synthetic dataset of ship images is created and annotated with 2D keypoints of multiple ship parts. A Transformer Neural Network model is trained to detect these keypoints and estimate the 6D pose of each part. The estimates are integrated using Bayesian fusion. The model is tested on synthetic data and in-situ flight experiments, demonstrating robustness and accuracy in various lighting conditions. The position estimation error is approximately 0.8\% and 1.0\% of the distance to the ship for the synthetic data and the flight experiments, respectively. The method has potential applications for ship-based autonomous UAV landing and navigation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fdcl-gwu/tnn-mo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Inertial Sensor and Navigation

MethodsResidual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer