# Deep Learning-Based Eye-Writing Recognition with Improved Preprocessing and Data Augmentation Techniques

**Authors:** Kota Suzuki, Abu Saleh Musa Miah, Jungpil Shin

PMC · DOI: 10.3390/s25206325 · Sensors (Basel, Switzerland) · 2025-10-13

## TL;DR

This paper introduces a new vision-based eye-writing recognition system using a webcam and deep learning, achieving high accuracy with improved preprocessing and data augmentation.

## Contribution

The novel DFT-based length normalization and hybrid CNN-TCN model for eye-writing recognition are key innovations.

## Key findings

- The proposed system achieved 97.68% accuracy on a new webcam-captured Arabic numbers dataset.
- The model outperformed existing systems on benchmark datasets with 94.48% and 98.70% accuracy.
- DFT-based normalization improved input uniformity and model robustness.

## Abstract

Eye-tracking technology enables communication for individuals with muscle control difficulties, making it a valuable assistive tool. Traditional systems rely on electrooculography (EOG) or infrared devices, which are accurate but costly and invasive. While vision-based systems offer a more accessible alternative, they have not been extensively explored for eye-writing recognition. Additionally, the natural instability of eye movements and variations in writing styles result in inconsistent signal lengths, which reduces recognition accuracy and limits the practical use of eye-writing systems. To address these challenges, we propose a novel vision-based eye-writing recognition approach that utilizes a webcam-captured dataset. A key contribution of our approach is the introduction of a Discrete Fourier Transform (DFT)-based length normalization method that standardizes the length of each eye-writing sample while preserving essential spectral characteristics. This ensures uniformity in input lengths and improves both efficiency and robustness. Moreover, we integrate a hybrid deep learning model that combines 1D Convolutional Neural Networks (CNN) and Temporal Convolutional Networks (TCN) to jointly capture spatial and temporal features of eye-writing. To further improve model robustness, we incorporate data augmentation and initial-point normalization techniques. The proposed system was evaluated using our new webcam-captured Arabic numbers dataset and two existing benchmark datasets, with leave-one-subject-out (LOSO) cross-validation. The model achieved accuracies of 97.68% on the new dataset, 94.48% on the Japanese Katakana dataset, and 98.70% on the EOG-captured Arabic numbers dataset—outperforming existing systems. This work provides an efficient eye-writing recognition system, featuring robust preprocessing techniques, a hybrid deep learning model, and a new webcam-captured dataset.

## Full-text entities

- **Diseases:** eye-writing stroke (MESH:D020195), ALS (MESH:D000690), injury to (MESH:D014947), stroke (MESH:D020521), mobility impairments (MESH:D014086)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12568085/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12568085/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/PMC12568085/full.md

---
Source: https://tomesphere.com/paper/PMC12568085