Towards an Embodied Semantic Fovea: Semantic 3D scene reconstruction from ego-centric eye-tracker videos
Mickey Li, Noyan Songur, Pavel Orlov, Stefan Leutenegger, A Aldo, Faisal

TL;DR
This paper presents a real-time system that combines 3D scene reconstruction, semantic labeling, and gaze estimation from ego-centric RGB-D videos, advancing understanding of human-environment interactions in everyday tasks.
Contribution
It introduces a novel approach augmenting Semantic SLAM with gaze vectors for improved 3D semantic mapping from ego-centric videos.
Findings
Successfully produced semantic 3D maps from NYUv2 dataset images
Achieved reasonable accuracy in 3D object tracking and gaze estimation
Demonstrated real-time 3D mapping with semantic labels and gaze data
Abstract
Incorporating the physical environment is essential for a complete understanding of human behavior in unconstrained every-day tasks. This is especially important in ego-centric tasks where obtaining 3 dimensional information is both limiting and challenging with the current 2D video analysis methods proving insufficient. Here we demonstrate a proof-of-concept system which provides real-time 3D mapping and semantic labeling of the local environment from an ego-centric RGB-D video-stream with 3D gaze point estimation from head mounted eye tracking glasses. We augment existing work in Semantic Simultaneous Localization And Mapping (Semantic SLAM) with collected gaze vectors. Our system can then find and track objects both inside and outside the user field-of-view in 3D from multiple perspectives with reasonable accuracy. We validate our concept by producing a semantic map from images of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Robotics and Sensor-Based Localization · Visual Attention and Saliency Detection
