Breakdance Video classification in the age of Generative AI

Sauptik Dhar; Naveen Ramakrishnan; Michelle Munson

arXiv:2510.20287·cs.CV·October 24, 2025

Breakdance Video classification in the age of Generative AI

Sauptik Dhar, Naveen Ramakrishnan, Michelle Munson

PDF

Open Access

TL;DR

This paper evaluates modern video foundation models for classifying breakdance videos, revealing that video encoder models outperform language models and offering insights into model selection and fine-tuning.

Contribution

It is the first comprehensive analysis of video foundation models applied to breakdance classification, highlighting encoder model superiority and providing detailed insights into decoder model fine-tuning.

Findings

01

Video encoder models outperform state-of-the-art video language models in breakdance classification.

02

Insights into selecting appropriate encoder models for dance video analysis.

03

Thorough analysis of fine-tuned decoder models for this niche application.

Abstract

Large Vision Language models have seen huge application in several sports use-cases recently. Most of these works have been targeted towards a limited subset of popular sports like soccer, cricket, basketball etc; focusing on generative tasks like visual question answering, highlight generation. This work analyzes the applicability of the modern video foundation models (both encoder and decoder) for a very niche but hugely popular dance sports - breakdance. Our results show that Video Encoder models continue to outperform state-of-the-art Video Language Models for prediction tasks. We provide insights on how to choose the encoder model and provide a thorough analysis into the workings of a finetuned decoder model for breakdance video classification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition