Gaze-based Object Detection in the Wild
Daniel Weber, Wolfgang Fuhl, Andreas Zell, Enkelejda Kasneci

TL;DR
This paper explores gaze-based object detection in realistic human-robot interaction scenarios, using heatmaps derived from gaze data and machine learning to identify objects and their bounding boxes efficiently.
Contribution
It introduces a novel gaze-based detection method utilizing heatmaps with variable temporal windows and grid sizes, demonstrating speed and resource efficiency over traditional detectors.
Findings
Effective object detection from gaze heatmaps in real-world scenarios
Method achieves high speed and low resource usage
Public dataset available for further research
Abstract
In human-robot collaboration, one challenging task is to teach a robot new yet unknown objects enabling it to interact with them. Thereby, gaze can contain valuable information. We investigate if it is possible to detect objects (object or no object) merely from gaze data and determine their bounding box parameters. For this purpose, we explore different sizes of temporal windows, which serve as a basis for the computation of heatmaps, i.e., the spatial distribution of the gaze data. Additionally, we analyze different grid sizes of these heatmaps, and demonstrate the functionality in a proof of concept using different machine learning techniques. Our method is characterized by its speed and resource efficiency compared to conventional object detectors. In order to generate the required data, we conducted a study with five subjects who could move freely and thus, turn towards arbitrary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Robotics and Sensor-Based Localization · Visual Attention and Saliency Detection
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
