Towards Real Time Egocentric Segment Captioning for The Blind and   Visually Impaired in RGB-D Theatre Images

Khadidja Delloul; Slimane Larabi

arXiv:2308.13892·cs.CV·August 29, 2023

Towards Real Time Egocentric Segment Captioning for The Blind and Visually Impaired in RGB-D Theatre Images

Khadidja Delloul, Slimane Larabi

PDF

Open Access

TL;DR

This paper proposes a real-time egocentric image captioning system for the blind and visually impaired, providing spatially aware descriptions of scenes in RGB-D theatre images to enhance scene understanding.

Contribution

It introduces a novel approach that generates spatially detailed scene descriptions, including object positions and relationships, tailored for egocentric RGB-D images in theatre environments.

Findings

01

Effective spatially aware captions generated in real-time

02

Enhanced scene understanding for visually impaired users

03

Application demonstrated on theatre RGB-D dataset

Abstract

In recent years, image captioning and segmentation have emerged as crucial tasks in computer vision, with applications ranging from autonomous driving to content analysis. Although multiple solutions have emerged to help blind and visually impaired people move around their environment, few are applications that help them understand and rebuild a scene in their minds through text. Most built models focus on helping users move and avoid obstacles, restricting the number of environments blind and visually impaired people can be in. In this paper, we will propose an approach that helps them understand their surroundings using image captioning. The particularity of our research is that we offer them descriptions with positions of regions and objects regarding them (left, right, front), as well as positional relationships between regions, while we aim to give them access to theatre plays by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition

MethodsFocus