# Product Engagement Detection Using Multi-Camera 3D Skeleton Reconstruction and Gaze Estimation

**Authors:** Matus Tanonwong, Yu Zhu, Naoya Chiba, Koichi Hashimoto

PMC · DOI: 10.3390/s25103031 · Sensors (Basel, Switzerland) · 2025-05-11

## TL;DR

This paper introduces a privacy-preserving system using 3D skeleton data and gaze estimation to detect customer engagement with products in retail settings.

## Contribution

A novel Transformer-based gaze estimation system using 3D skeletal keypoints that preserves privacy and performs well in real-world retail environments.

## Key findings

- The system reliably detects gaze-object and hand-object interactions in simulated retail environments.
- The Transformer-based model achieves comparable gaze-object detection performance despite slightly higher gaze estimation errors.
- The method is robust, generalizable, and suitable for deployment in real-world retail scenarios.

## Abstract

Product engagement detection in retail environments is critical for understanding customer preferences through nonverbal cues such as gaze and hand movements. This study presents a system leveraging a 360-degree top-view fisheye camera combined with two perspective cameras, the only sensors required for deployment, effectively capturing subtle interactions even under occlusion or distant camera setups. Unlike conventional image-based gaze estimation methods that are sensitive to background variations and require capturing a person’s full appearance, raising privacy concerns, our approach utilizes a novel Transformer-based encoder operating directly on 3D skeletal keypoints. This innovation significantly reduces privacy risks by avoiding personal appearance data and benefits from ongoing advancements in accurate skeleton estimation techniques. Experimental evaluation in a simulated retail environment demonstrates that our method effectively identifies critical gaze-object and hand-object interactions, reliably detecting customer engagement prior to product selection. Despite yielding slightly higher mean angular errors in gaze estimation compared to a recent image-based method, the Transformer-based model achieves comparable performance in gaze-object detection. Its robustness, generalizability, and inherent privacy preservation make it particularly suitable for deployment in practical retail scenarios such as convenience stores, supermarkets, and shopping malls, highlighting its superiority in real-world applicability.

## Full-text entities

- **Genes:** CAMK2B (calcium/calmodulin dependent protein kinase II beta) [NCBI Gene 816] {aka CAM2, CAMK2, CAMKB, CaMKIIbeta, MRD54}
- **Diseases:** gaze loss (MESH:D015835), head loss (MESH:D006258), injury to (MESH:D014947), body loss (MESH:D001835)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** A through H

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12115284/full.md

## Figures

30 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12115284/full.md

## References

51 references — full list in the complete paper: https://tomesphere.com/paper/PMC12115284/full.md

---
Source: https://tomesphere.com/paper/PMC12115284