TS-RGBD Dataset: a Novel Dataset for Theatre Scenes Description for   People with Visual Impairments

Leyla Benhamida; Khadidja Delloul; Slimane Larabi

arXiv:2308.01035·cs.CV·August 3, 2023·1 cites

TS-RGBD Dataset: a Novel Dataset for Theatre Scenes Description for People with Visual Impairments

Leyla Benhamida, Khadidja Delloul, Slimane Larabi

PDF

Open Access 1 Repo

TL;DR

This paper introduces TS-RGBD, a new RGB-D dataset with theatre scenes, enabling improved image captioning and human action recognition for aiding visually impaired individuals in complex environments.

Contribution

The paper presents a novel RGB-D dataset for theatre scenes, including ground truth annotations for actions and captions, expanding computer vision applications for visually impaired assistance.

Findings

01

Image captioning models perform well on theatre scenes

02

Skeleton-based action recognition models are effective in this context

03

Dataset enables better scene understanding for visually impaired aid

Abstract

Computer vision was long a tool used for aiding visually impaired people to move around their environment and avoid obstacles and falls. Solutions are limited to either indoor or outdoor scenes, which limits the kind of places and scenes visually disabled people can be in, including entertainment places such as theatres. Furthermore, most of the proposed computer-vision-based methods rely on RGB benchmarks to train their models resulting in a limited performance due to the absence of the depth modality. In this paper, we propose a novel RGB-D dataset containing theatre scenes with ground truth human actions and dense captions annotations for image captioning and human action recognition: TS-RGBD dataset. It includes three types of data: RGB, depth, and skeleton sequences, captured by Microsoft Kinect. We test image captioning models on our dataset as well as some skeleton-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

khadidja-delloul/rgb-d-theatre-scenes-dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Hand Gesture Recognition Systems