TL;DR
GaTector is a unified framework that combines gaze prediction and object detection using shared features and novel modules, significantly improving performance on gaze object prediction tasks.
Contribution
It introduces a shared backbone with input-specific and task-specific blocks, a Defocus layer for object features, and a new wUoC metric, enabling joint optimization for gaze and object detection.
Findings
Outperforms previous methods in object detection, gaze estimation, and gaze object prediction.
Effectively utilizes shared features to reduce network complexity.
Achieves superior results on the GOO dataset across all tasks.
Abstract
Gaze object prediction is a newly proposed task that aims to discover the objects being stared at by humans. It is of great application significance but still lacks a unified solution framework. An intuitive solution is to incorporate an object detection branch into an existing gaze prediction method. However, previous gaze prediction methods usually use two different networks to extract features from scene image and head image, which would lead to heavy network architecture and prevent each branch from joint optimization. In this paper, we build a novel framework named GaTector to tackle the gaze object prediction problem in a unified way. Particularly, a specific-general-specific (SGS) feature extractor is firstly proposed to utilize a shared backbone to extract general features for both scene and head images. To better consider the specificity of inputs and tasks, SGS introduces two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsHeatmap
