Non-verbal Real-time Human-AI Interaction in Constrained Robotic Environments
Dragos Costea, Alina Marcu, Cristina Lazar, Marius Leordeanu

TL;DR
This paper introduces a real-time framework for non-verbal human-AI interaction using body motion, demonstrating the potential and limitations of current generative models in mimicking human-like body language.
Contribution
It presents the first real-time system generating natural non-verbal interactions from 2D body keypoints, with insights into the impact of synthetic training data and real-world performance gaps.
Findings
Pretraining on synthetic data reduces motion errors.
Performance drops when evaluated on certain AI-generated videos.
Temporal coherence influences real-world interaction quality.
Abstract
We study the ongoing debate regarding the statistical fidelity of AI-generated data compared to human-generated data in the context of non-verbal communication using full body motion. Concretely, we ask if contemporary generative models move beyond surface mimicry to participate in the silent, but expressive dialogue of body language. We tackle this question by introducing the first framework that generates a natural non-verbal interaction between Human and AI in real-time from 2D body keypoints. Our experiments utilize four lightweight architectures which run at up to 100 FPS on an NVIDIA Orin Nano, effectively closing the perception-action loop needed for natural Human-AI interaction. We trained on 437 human video clips and demonstrated that pretraining on synthetically-generated sequences reduces motion errors significantly, without sacrificing speed. Yet, a measurable reality gap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation
