In-Materia Speech Recognition

Mohamadreza Zolfagharinejad; Julian B\"uchel; Lorenzo Cassola; Sachin Kinge; Ghazi Sarwat Syed; Abu Sebastian; Wilfred G. van der Wiel

arXiv:2410.10434·eess.AS·September 23, 2025

In-Materia Speech Recognition

Mohamadreza Zolfagharinejad, Julian B\"uchel, Lorenzo Cassola, Sachin Kinge, Ghazi Sarwat Syed, Abu Sebastian, Wilfred G. van der Wiel

PDF

Open Access

TL;DR

This paper introduces an in-materia edge speech recognition system combining analogue feature extraction and in-memory neural network classification, achieving high accuracy with ultra-low power consumption suitable for edge devices.

Contribution

It presents a novel in-materia computing hardware architecture integrating a dopant-network-processing-unit and memristive crossbar arrays for efficient, low-power speech recognition at the edge.

Findings

01

Achieved 96.2% accuracy on TI-46-Word speech recognition task.

02

DNPU feature extraction consumes only hundreds of nanowatts.

03

AIMC classification potentially uses less than 10 femtojoules per operation.

Abstract

With the rise of decentralized computing, as in the Internet of Things, autonomous driving, and personalized healthcare, it is increasingly important to process time-dependent signals at the edge efficiently: right at the place where the temporal data are collected, avoiding time-consuming, insecure, and costly communication with a centralized computing facility (or cloud). However, modern-day processors often cannot meet the restrained power and time budgets of edge systems because of intrinsic limitations imposed by their architecture (von Neumann bottleneck) or domain conversions (analogue-to-digital and time-to-frequency). Here, we propose an edge temporal-signal processor based on two in-materia computing systems for both feature extraction and classification, reaching a software-level accuracy of 96.2% for the TI-46-Word speech-recognition task. First, a nonlinear,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing