Human Gaze Boosts Object-Centered Representation Learning

Timothy Schauml\"offel; Arthur Aubret; Gemma Roig; Jochen Triesch

arXiv:2501.02966·cs.CV·January 7, 2025

Human Gaze Boosts Object-Centered Representation Learning

Timothy Schauml\"offel, Arthur Aubret, Gemma Roig, Jochen Triesch

PDF

Open Access

TL;DR

This study shows that emphasizing central visual information around gaze points in egocentric videos improves object-centered representation learning, inspired by human visual processing, and leverages gaze dynamics for better visual understanding.

Contribution

The paper introduces a gaze-centered cropping approach in SSL models trained on egocentric videos, demonstrating improved object-centered representations inspired by human vision.

Findings

01

Focusing on gaze-centered regions enhances object representation quality.

02

Temporal gaze dynamics contribute to stronger visual features.

03

Gaze-based cropping outperforms uniform visual inputs in SSL training.

Abstract

Recent self-supervised learning (SSL) models trained on human-like egocentric visual inputs substantially underperform on image recognition tasks compared to humans. These models train on raw, uniform visual inputs collected from head-mounted cameras. This is different from humans, as the anatomical structure of the retina and visual cortex relatively amplifies the central visual information, i.e. around humans' gaze location. This selective amplification in humans likely aids in forming object-centered visual representations. Here, we investigate whether focusing on central visual information boosts egocentric visual object learning. We simulate 5-months of egocentric visual experience using the large-scale Ego4D dataset and generate gaze locations with a human gaze prediction model. To account for the importance of central vision in humans, we crop the visual area around the gaze…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaze Tracking and Assistive Technology · Visual Attention and Saliency Detection · Face Recognition and Perception