Semantic Labeling of Human Action For Visually Impaired And Blind People Scene Interaction
Leyla Benhamida, Slimane Larabi

TL;DR
This paper develops a tactile device for visually impaired individuals that recognizes and semantically labels human actions using RGB-D data and multi-modal fusion, enabling interaction through touch.
Contribution
It introduces a multi-modal approach combining skeleton and depth data with advanced models for accurate action recognition tailored for visually impaired users.
Findings
Effective action recognition on real scenes.
Fusion of skeleton and depth modalities improves accuracy.
Semantic labeling enables tactile interaction.
Abstract
The aim of this work is to contribute to the development of a tactile device for visually impaired and blind persons in order to let them to understand actions of the surrounding people and to interact with them. First, based on the state-of-the-art methods of human action recognition from RGB-D sequences, we use the skeleton information provided by Kinect, with the disentangled and unified multi-scale Graph Convolutional (MS-G3D) model to recognize the performed actions. We tested this model on real scenes and found some of constraints and limitations. Next, we apply a fusion between skeleton modality with MS-G3D and depth modality with CNN in order to bypass the discussed limitations. Third, the recognized actions are labeled semantically and will be mapped into an output device perceivable by the touch sense.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Tactile and Sensory Interactions · Hand Gesture Recognition Systems
