# Word-Level Motion Learning for Contactless QWERTY Typing with a Single Camera

**Authors:** Sung-Sic Yoo, Heung-Shik Lee

PMC · DOI: 10.3390/s26041087 · 2026-02-07

## TL;DR

This paper introduces a new method for contactless typing using a single camera by recognizing whole words based on finger motion patterns.

## Contribution

The novelty lies in modeling word-level typing as spatiotemporal motion patterns using hand joint trajectories, enabling robust recognition with a single camera.

## Key findings

- The proposed framework achieves stable word-level typing recognition using motion prototypes learned through repeated interaction.
- Motion representations transfer effectively from physical keyboards to flat surfaces, even with reduced tactile and visual cues.
- The method shows potential as a complement to character-based input systems in monocular sensing environments.

## Abstract

Contactless text entry is increasingly important in immersive and constrained computing environments, yet most vision-based approaches rely on character-level recognition or key localization, which are fragile under monocular sensing. This study investigates the feasibility of recognizing natural QWERTY typing motions directly at the word level using only a single RGB camera, under a fixed single-user and single-camera configuration. We propose a word-level contactless typing framework that models each word as a distinctive spatiotemporal finger motion pattern derived from hand joint trajectories. Typing motions are temporally segmented, and direction-aware finger displacements are accumulated to construct compact motion representations that are relatively insensitive to absolute hand position and typing duration within the evaluated setup. Each word is represented by multiple motion prototypes that are incrementally updated through online learning with a trial-delayed adaptation protocol. Experiments with vocabularies of up to 200 words show that the proposed approach progressively learns and recalls word-level motion patterns through repeated interaction, achieving stable recognition performance within the tested configuration at realistic typing speeds. Additional evaluations demonstrate that learned motion representations can transfer from physical keyboards to flat-surface typing within the same experimental setting, even when tactile feedback and visual layout cues are reduced. These results support the feasibility of reframing contactless typing as a word-level motion recall problem, and suggest its potential role as a complementary component to character-centric camera-based input methods under constrained monocular sensing.

## Full-text entities

- **Diseases:** injury to (MESH:D014947), Delayed Learning (MESH:D007859)
- **Chemicals:** chlorine (MESH:D002713)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12944356/full.md

---
Source: https://tomesphere.com/paper/PMC12944356