# Eyes ahead: a scoping review of technologies enabling humanoid robots to follow human gaze

**Authors:** Leana Neuber, Wolf Culemann, Ruth Maria Ingendoh, Angela Heine

PMC · DOI: 10.3389/frobt.2025.1723527 · Frontiers in Robotics and AI · 2026-01-16

## TL;DR

This paper reviews technologies that allow humanoid robots to follow human gaze, aiming to improve social interaction in human-robot settings.

## Contribution

The paper provides a structured taxonomy and scoping review of gaze-following technologies in human-robot interaction.

## Key findings

- Gaze-following in robots is categorized along three functional dimensions: environment tracking, gaze tracking, and gaze-environment mapping.
- Constrained methods offer high accuracy but are limited to controlled settings, while unconstrained methods are flexible but technically challenging.
- Future research should focus on improving real-time gaze-following in dynamic environments using machine learning and computer vision.

## Abstract

Gaze is a fundamental aspect of non-verbal communication in human interaction, playing an important role in conveying attention, intentions, and emotions. A key concept in gaze-based human interaction is joint attention, the focus of two individuals on an object in a shared environment. In the context of human–robot interaction (HRI), gaze-following has become a growing research area, as it enables robots to appear more socially intelligent, engaging, and likable. While various technical approaches have been developed to achieve this capability, a comprehensive overview of existing implementations has been lacking. This scoping review addresses this gap by systematically categorizing existing solutions, offering a structured perspective on how gaze-following behavior is technically realized in the field of HRI. A systematic search was conducted across four databases, leading to the identification of 28 studies. To structure the findings, a taxonomy was developed that categorizes technological approaches along three key functional dimensions: (1) environment tracking, which involves recognizing the objects in the robot’s surroundings; (2) gaze tracking, which refers to detecting and interpreting human gaze direction; and (3) gaze–environment mapping, which connects gaze information with objects in the shared environment to enable appropriate robotic responses. Across studies, a distinction emerges between constrained and unconstrained solutions. While constrained approaches, such as predefined object positions, provide high accuracy, they are often limited to controlled settings. In contrast, unconstrained methods offer greater flexibility but pose significant technical challenges. The complexity of the implementations also varies significantly, from simple rule-based approaches to advanced, adaptive systems that integrate multiple data sources. These findings highlight ongoing challenges in achieving robust and real-time gaze-following in robots, particularly in dynamic, real-world environments. Future research should focus on refining unconstrained tracking methods and leveraging advances in machine learning and computer vision to make human–robot interactions more natural and socially intuitive.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12856928/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12856928/full.md

## References

59 references — full list in the complete paper: https://tomesphere.com/paper/PMC12856928/full.md

---
Source: https://tomesphere.com/paper/PMC12856928