# TENG-Based Self-Powered Silent Speech Recognition Interface: from Assistive Communication to Immersive AR/VR Interaction

**Authors:** Shuai Lin, Yanmin Guo, Xiangyao Zeng, Xiongtu Zhou, Yongai Zhang, Chengda Li, Chaoxing Wu

PMC · DOI: 10.1007/s40820-025-01982-z · Nano-Micro Letters · 2026-01-12

## TL;DR

This paper introduces a self-powered system that uses jaw movements to control devices and enable silent speech, with potential for assistive communication and AR/VR.

## Contribution

A hybrid CNN-LSTM neural network and a porous pyramid-structured triboelectric sensor for silent speech recognition are proposed.

## Key findings

- The sensor achieves high sensitivity for low-force jaw movement detection.
- The CNN-LSTM model achieves 95.83% accuracy in classifying 30 daily words.
- The system enables real-time, contactless smartphone and AR/VR control.

## Abstract

A porous pyramid-structured triboelectric nanogenerator sensor is designed for self-powered silent speech signal acquisition.A hybrid neural network that combines convolutional neural network with long short-term memory is proposed to accurately decode silent speech signals.Silent speech commands enable real-time, contactless control of smartphones and immersive AR/VR interaction.

A porous pyramid-structured triboelectric nanogenerator sensor is designed for self-powered silent speech signal acquisition.

A hybrid neural network that combines convolutional neural network with long short-term memory is proposed to accurately decode silent speech signals.

Silent speech commands enable real-time, contactless control of smartphones and immersive AR/VR interaction.

The online version contains supplementary material available at 10.1007/s40820-025-01982-z.

Lip language provides a silent, intuitive, and efficient mode of communication, offering a promising solution for individuals with speech impairments. Its articulation relies on complex movements of the jaw and the muscles surrounding it. However, the accurate and real-time acquisition and decoding of these movements into reliable silent speech signals remains a significant challenge. In this work, we propose a real-time silent speech recognition system, which integrates a triboelectric nanogenerator-based flexible pressure sensor (FPS) with a deep learning framework. The FPS employs a porous pyramid–structured silicone film as the negative triboelectric layer, enabling highly sensitive pressure detection in the low-force regime (1 V N− 1 for 0–10 N and 4.6 V N− 1 for 10–24 N). This allows it to precisely capture jaw movements during speech and convert them into electrical signals. To decode the signals, we proposed a convolutional neural network-long short-term memory (CNN–LSTM) hybrid network, combining CNN and LSTM model to extract both local spatial features and temporal dynamics. The model achieved 95.83% classification accuracy in 30 categories of daily words. Furthermore, the decoded silent speech signals can be directly translated into executable commands for contactless and precise control of the smartphone. The system can also be connected to AR glasses, offering a novel human–machine interaction approach with promising potential in AR/VR applications.

The online version contains supplementary material available at 10.1007/s40820-025-01982-z.

## Full-text entities

- **Diseases:** speech impairments (MESH:D013064)
- **Chemicals:** silicone (MESH:D012828)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12791100/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12791100/full.md

## References

1 references — full list in the complete paper: https://tomesphere.com/paper/PMC12791100/full.md

---
Source: https://tomesphere.com/paper/PMC12791100