# Detecting Gaze Towards Eyes in Natural Social Interactions and its Use   in Child Assessment

**Authors:** Eunji Chong, Katha Chanda, Zhefan Ye, Audrey Southerland, Nataniel, Ruiz, Rebecca M. Jones, Agata Rozga, James M. Rehg

arXiv: 1902.00607 · 2019-02-05

## TL;DR

This paper introduces a deep learning system for detecting eye contact in naturalistic adult-child interactions using egocentric video, aiding assessments of social communication skills, especially in children with autism.

## Contribution

It presents the Pose-Implicit CNN architecture and a fully automated system for eye contact detection from egocentric videos, improving accuracy over existing methods.

## Key findings

- Achieved 0.76 precision and 0.80 recall in eye contact detection.
- Developed a dataset of 22 hours of child social interaction videos.
- Significant improvements over prior approaches in accuracy.

## Abstract

Eye contact is a crucial element of non-verbal communication that signifies interest, attention, and participation in social interactions. As a result, measures of eye contact arise in a variety of applications such as the assessment of the social communication skills of children at risk for developmental disorders such as autism, or the analysis of turn-taking and social roles during group meetings. However, the automated measurement of visual attention during naturalistic social interactions is challenging due to the difficulty of estimating a subject's looking direction from video. This paper proposes a novel approach to eye contact detection during adult-child social interactions in which the adult wears a point-of-view camera which captures an egocentric view of the child's behavior. By analyzing the child's face regions and inferring their head pose we can accurately identify the onset and duration of the child's looks to their social partner's eyes. We introduce the Pose-Implicit CNN, a novel deep learning architecture that predicts eye contact while implicitly estimating the head pose. We present a fully automated system for eye contact detection that solves the sub-problems of end-to-end feature learning and pose estimation using deep neural networks. To train our models, we use a dataset comprising 22 hours of 156 play session videos from over 100 children, half of whom are diagnosed with Autism Spectrum Disorder. We report an overall precision of 0.76, recall of 0.80, and an area under the precision-recall curve of 0.79, all of which are significant improvements over existing methods.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.00607/full.md

## Figures

37 figures with captions in the complete paper: https://tomesphere.com/paper/1902.00607/full.md

## References

63 references — full list in the complete paper: https://tomesphere.com/paper/1902.00607/full.md

---
Source: https://tomesphere.com/paper/1902.00607