Integrating Persian Lip Reading in Surena-V Humanoid Robot for Human-Robot Interaction
Ali Farshian Abbasi, Aghil Yousefi-Koma, Soheil Dehghani Firouzabadi,, Parisa Rashidi, Alireza Naeini

TL;DR
This paper presents the development and integration of Persian lip-reading technology into the Surena-V humanoid robot, enhancing its ability to understand speech in social and crowded environments through CNN and LSTM methods.
Contribution
It introduces a Persian lip-reading dataset and compares indirect and direct methods, successfully implementing the best model into a humanoid robot for real-time interaction.
Findings
LSTM model achieved 89% accuracy in lip reading.
Successful real-time implementation on Surena-V robot.
Enhanced robot communication in noisy or crowded settings.
Abstract
Lip reading is vital for robots in social settings, improving their ability to understand human communication. This skill allows them to communicate more easily in crowded environments, especially in caregiving and customer service roles. Generating a Persian Lip-reading dataset, this study integrates Persian lip-reading technology into the Surena-V humanoid robot to improve its speech recognition capabilities. Two complementary methods are explored, an indirect method using facial landmark tracking and a direct method leveraging convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. The indirect method focuses on tracking key facial landmarks, especially around the lips, to infer movements, while the direct method processes raw video data for action and speech recognition. The best-performing model, LSTM, achieved 89\% accuracy and has been successfully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Hand Gesture Recognition Systems · Robotics and Automated Systems
Methodstravel james · Tanh Activation · Sigmoid Activation · Long Short-Term Memory
