Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis
Andrzej D. Dobrzycki, Ana M. Bernardos, Luca Bergesio, Andrzej Pomirski, Daniel S\'aez-Trigueros

TL;DR
This paper evaluates the effectiveness of CLIP, a multimodal learning model, for human posture classification in yoga, demonstrating high accuracy, efficiency, and potential for real-time applications.
Contribution
It presents a detailed procedure for fine-tuning CLIP for yoga pose classification, achieving state-of-the-art accuracy with reduced training time.
Findings
Fine-tuned CLIP achieves over 85% accuracy on 82 classes.
High accuracy (98.8% and 99.1%) on small datasets with few training images.
Training with as few as 20 images per pose yields around 90% accuracy.
Abstract
Accurate human posture classification in images and videos is crucial for automated applications across various fields, including work safety, physical rehabilitation, sports training, or daily assisted living. Recently, multimodal learning methods, such as Contrastive Language-Image Pretraining (CLIP), have advanced significantly in jointly understanding images and text. This study aims to assess the effectiveness of CLIP in classifying human postures, focusing on its application in yoga. Despite the initial limitations of the zero-shot approach, applying transfer learning on 15,301 images (real and synthetic) with 82 classes has shown promising results. The article describes the full procedure for fine-tuning, including the choice for image description syntax, models and hyperparameters adjustment. The fine-tuned CLIP model, tested on 3826 images, achieves an accuracy of over 85%,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Language-Image Pre-training
