Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis

Andrzej D. Dobrzycki; Ana M. Bernardos; Luca Bergesio; Andrzej Pomirski; Daniel S\'aez-Trigueros

arXiv:2501.07221·cs.CV·November 25, 2025

Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis

Andrzej D. Dobrzycki, Ana M. Bernardos, Luca Bergesio, Andrzej Pomirski, Daniel S\'aez-Trigueros

PDF

TL;DR

This paper evaluates the effectiveness of CLIP, a multimodal learning model, for human posture classification in yoga, demonstrating high accuracy, efficiency, and potential for real-time applications.

Contribution

It presents a detailed procedure for fine-tuning CLIP for yoga pose classification, achieving state-of-the-art accuracy with reduced training time.

Findings

01

Fine-tuned CLIP achieves over 85% accuracy on 82 classes.

02

High accuracy (98.8% and 99.1%) on small datasets with few training images.

03

Training with as few as 20 images per pose yields around 90% accuracy.

Abstract

Accurate human posture classification in images and videos is crucial for automated applications across various fields, including work safety, physical rehabilitation, sports training, or daily assisted living. Recently, multimodal learning methods, such as Contrastive Language-Image Pretraining (CLIP), have advanced significantly in jointly understanding images and text. This study aims to assess the effectiveness of CLIP in classifying human postures, focusing on its application in yoga. Despite the initial limitations of the zero-shot approach, applying transfer learning on 15,301 images (real and synthetic) with 82 classes has shown promising results. The article describes the full procedure for fine-tuning, including the choice for image description syntax, models and hyperparameters adjustment. The fine-tuned CLIP model, tested on 3826 images, achieves an accuracy of over 85%,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Language-Image Pre-training