# Continuous interaction with a smart speaker via low-dimensional   embeddings of dynamic hand pose

**Authors:** Songpei Xu, Chaitanya Kaul, Xuri Ge, Roderick Murray-Smith

arXiv: 2302.14566 · 2023-03-01

## TL;DR

This paper introduces a low-dimensional embedding approach for continuous mid-air gesture recognition and control of a smart music speaker using only two video frames, enabling intuitive interaction with musical content.

## Contribution

It presents a novel system combining autoencoder-based hand pose embedding and music profile embedding for real-time gesture recognition and device control.

## Key findings

- Effective recognition of hand gestures with only two video frames
- Successful control of music mood selection through mid-air gestures
- Joint optimization improves gesture discrimination accuracy

## Abstract

This paper presents a new continuous interaction strategy with visual feedback of hand pose and mid-air gesture recognition and control for a smart music speaker, which utilizes only 2 video frames to recognize gestures. Frame-based hand pose features from MediaPipe Hands, containing 21 landmarks, are embedded into a 2 dimensional pose space by an autoencoder. The corresponding space for interaction with the music content is created by embedding high-dimensional music track profiles to a compatible two-dimensional embedding. A PointNet-based model is then applied to classify gestures which are used to control the device interaction or explore music spaces. By jointly optimising the autoencoder with the classifier, we manage to learn a more useful embedding space for discriminating gestures. We demonstrate the functionality of the system with experienced users selecting different musical moods by varying their hand pose.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.14566/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/2302.14566/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/2302.14566/full.md

---
Source: https://tomesphere.com/paper/2302.14566