# Brownian motion data augmentation: a method to push neural network performance on nanopore sensors

**Authors:** Javier Kipen, Joakim Jaldén

PMC · DOI: 10.1093/bioinformatics/btaf323 · 2025-05-29

## TL;DR

This paper introduces a data augmentation method using Brownian motion to improve neural network performance for nanopore sensor data analysis.

## Contribution

A novel data augmentation technique and a new neural network, YupanaNet, that outperforms previous models on nanopore classification tasks.

## Key findings

- Brownian motion data augmentation significantly improves QuipuNet's classification accuracy.
- YupanaNet achieves 95.8% accuracy, surpassing QuipuNet's 94.6% on the same dataset.
- The method enhances generalization and leverages novel neural network architectures.

## Abstract

Nanopores are highly sensitive sensors that have achieved commercial success in DNA/RNA sequencing, with potential applications in protein sequencing and biomarker identification. Solid-state nanopores, in particular, face challenges such as instability and low signal-to-noise ratios, which lead scientists to adopt data-driven methods for nanopore signal analysis, although data acquisition remains restrictive.

We address this data scarcity by augmenting the training samples with traces that emulate Brownian motion effects, based on dynamic models in the literature. We apply this method to a publicly available dataset of a classification task containing nanopore reads of DNA with encoded barcodes. A neural network named QuipuNet was previously published for this dataset, and we demonstrate that our augmentation method produces a noticeable increase in QuipuNet’s accuracy. Furthermore, we introduce a novel neural network named YupanaNet, which achieves greater accuracy (95.8%) than QuipuNet (94.6%) on the same dataset. YupanaNet benefits from both the enhanced generalization provided by Brownian motion data augmentation and the incorporation of novel architectures, including skip connections and a soft attention mask.

The source code and data are available at: https://github.com/JavierKipen/browDataAug.

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12208072/full.md

---
Source: https://tomesphere.com/paper/PMC12208072