Towards Real Time Egocentric Segment Captioning for The Blind and Visually Impaired in RGB-D Theatre Images
Khadidja Delloul, Slimane Larabi

TL;DR
This paper proposes a real-time egocentric image captioning system for the blind and visually impaired, providing spatially aware descriptions of scenes in RGB-D theatre images to enhance scene understanding.
Contribution
It introduces a novel approach that generates spatially detailed scene descriptions, including object positions and relationships, tailored for egocentric RGB-D images in theatre environments.
Findings
Effective spatially aware captions generated in real-time
Enhanced scene understanding for visually impaired users
Application demonstrated on theatre RGB-D dataset
Abstract
In recent years, image captioning and segmentation have emerged as crucial tasks in computer vision, with applications ranging from autonomous driving to content analysis. Although multiple solutions have emerged to help blind and visually impaired people move around their environment, few are applications that help them understand and rebuild a scene in their minds through text. Most built models focus on helping users move and avoid obstacles, restricting the number of environments blind and visually impaired people can be in. In this paper, we will propose an approach that helps them understand their surroundings using image captioning. The particularity of our research is that we offer them descriptions with positions of regions and objects regarding them (left, right, front), as well as positional relationships between regions, while we aim to give them access to theatre plays by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition
MethodsFocus
