# Multimodal Fusion Image Stabilization Algorithm for Bio-Inspired Flapping-Wing Aircraft

**Authors:** Zhikai Wang, Sen Wang, Yiwen Hu, Yangfan Zhou, Na Li, Xiaofeng Zhang

PMC · DOI: 10.3390/biomimetics10070448 · 2025-07-07

## TL;DR

This paper introduces FWStab, a dataset and framework for stabilizing videos from flapping-wing aircraft by combining sensor data and images.

## Contribution

The novel contribution is a multimodal fusion framework for video stabilization using IMU data and images, trained unsupervised with a joint loss function.

## Key findings

- FWStab dataset includes 48 video clips with synchronized IMU data for multimodal modeling.
- The proposed framework improves inter-frame stability and avoids visual artifacts from traditional methods.
- Using LSTM and a joint loss function, the framework achieves high-precision trajectory prediction.

## Abstract

This paper presents FWStab, a specialized video stabilization dataset tailored for flapping-wing platforms. The dataset encompasses five typical flight scenarios, featuring 48 video clips with intense dynamic jitter. The corresponding Inertial Measurement Unit (IMU) sensor data are synchronously collected, which jointly provide reliable support for multimodal modeling. Based on this, to address the issue of poor image acquisition quality due to severe vibrations in aerial vehicles, this paper proposes a multi-modal signal fusion video stabilization framework. This framework effectively integrates image features and inertial sensor features to predict smooth and stable camera poses. During the video stabilization process, the true camera motion originally estimated based on sensors is warped to the smooth trajectory predicted by the network, thereby optimizing the inter-frame stability. This approach maintains the global rigidity of scene motion, avoids visual artifacts caused by traditional dense optical flow-based spatiotemporal warping, and rectifies rolling shutter-induced distortions. Furthermore, the network is trained in an unsupervised manner by leveraging a joint loss function that integrates camera pose smoothness and optical flow residuals. When coupled with a multi-stage training strategy, this framework demonstrates remarkable stabilization adaptability across a wide range of scenarios. The entire framework employs Long Short-Term Memory (LSTM) to model the temporal characteristics of camera trajectories, enabling high-precision prediction of smooth trajectories.

## Full-text entities

- **Diseases:** Boundary Prominence Loss (MESH:C000721290), Smoothness Loss (MESH:D018235), injury to (MESH:D014947)
- **Chemicals:** LSTM (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12292680/full.md

---
Source: https://tomesphere.com/paper/PMC12292680