Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints
Masoumeh Chapariniya, Hossein Ranjbar, Teodora Vukovic, Sarah Ebling, Volker Dellwo

TL;DR
This paper introduces a novel two-stream transformer framework that leverages spatial and temporal keypoint data from upper bodies during conversations to improve person identification robustness against deepfakes.
Contribution
It presents a dual-branch transformer model that jointly captures structural and motion features from conversational keypoints, enhancing identification accuracy over existing methods.
Findings
Achieved 80.12% accuracy with spatial stream
Combined fusion strategies improved accuracy to 94.86%
Demonstrated robustness against deepfake and spoofing techniques
Abstract
In the age of AI-driven generative technologies, traditional biometric recognition systems face unprecedented challenges, particularly from sophisticated deepfake and face reenactment techniques. In this study, we propose a Two-Stream Spatial-Temporal Transformer Framework for person identification using upper body keypoints visible during online conversations, which we term conversational keypoints. Our framework processes both spatial relationships between keypoints and their temporal evolution through two specialized branches: a Spatial Transformer (STR) that learns distinctive structural patterns in keypoint configurations, and a Temporal Transformer (TTR) that captures sequential motion patterns. Using the state-of-the-art Sapiens pose estimator, we extract 133 keypoints (based on COCO-WholeBody format) representing facial features, head pose, and hand positions. The framework was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Biometric Identification and Security · Gait Recognition and Analysis
