Typing on Any Surface: A Deep Learning-based Method for Real-Time Keystroke Detection in Augmented Reality
Xingyu Fu, Mingze Xi

TL;DR
This paper introduces a deep learning-based method for real-time keystroke detection in augmented reality, allowing users to type on any flat surface without physical or virtual keyboards, achieving high accuracy and speed.
Contribution
The paper presents a novel adaptive C-RNN model combined with hand landmark extraction for accurate, real-time keystroke prediction from user perspective RGB videos in AR.
Findings
Achieved 91.05% accuracy at 40 WPM
Processed video streams at approximately 32 FPS
Demonstrated viability for practical AR text entry
Abstract
Frustrating text entry interface has been a major obstacle in participating in social activities in augmented reality (AR). Popular options, such as mid-air keyboard interface, wireless keyboards or voice input, either suffer from poor ergonomic design, limited accuracy, or are simply embarrassing to use in public. This paper proposes and validates a deep-learning based approach, that enables AR applications to accurately predict keystrokes from the user perspective RGB video stream that can be captured by any AR headset. This enables a user to perform typing activities on any flat surface and eliminates the need of a physical or virtual keyboard. A two-stage model, combing an off-the-shelf hand landmark extractor and a novel adaptive Convolutional Recurrent Neural Network (C-RNN), was trained using our newly built dataset. The final model was capable of adaptive processing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Tactile and Sensory Interactions · Gaze Tracking and Assistive Technology
MethodsBalanced Selection
