Romanian Speech Recognition Experiments from the ROBIN Project
Andrei-Marius Avram, Vasile P\u{a}i\c{s}, Dan Tufi\c{s}

TL;DR
This paper presents a low-latency deep neural network-based Romanian speech recognition system achieving state-of-the-art accuracy, along with modules for output correction, integrated into a modular API-driven architecture for social robot dialogue applications.
Contribution
It introduces a fast, reliable deep neural network model for Romanian speech recognition with state-of-the-art accuracy and a modular correction architecture integrated into a platform for social robots.
Findings
Achieved 9.91% WER on Romanian speech recognition
Developed modules for output correction including hyphen, capitalization, and unknown words
Integrated the system into the RELATE platform for web-based speech processing
Abstract
One of the fundamental functionalities for accepting a socially assistive robot is its communication capabilities with other agents in the environment. In the context of the ROBIN project, situational dialogue through voice interaction with a robot was investigated. This paper presents different speech recognition experiments with deep neural networks focusing on producing fast (under 100ms latency from the network itself), while still reliable models. Even though one of the key desired characteristics is low latency, the final deep neural network model achieves state of the art results for recognizing Romanian language, obtaining a 9.91% word error rate (WER), when combined with a language model, thus improving over the previous results while offering at the same time an improved runtime performance. Additionally, we explore two modules for correcting the ASR output (hyphen and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Speech Recognition and Synthesis · Natural Language Processing Techniques
Methodstravel james
