Breakdance Video classification in the age of Generative AI
Sauptik Dhar, Naveen Ramakrishnan, Michelle Munson

TL;DR
This paper evaluates modern video foundation models for classifying breakdance videos, revealing that video encoder models outperform language models and offering insights into model selection and fine-tuning.
Contribution
It is the first comprehensive analysis of video foundation models applied to breakdance classification, highlighting encoder model superiority and providing detailed insights into decoder model fine-tuning.
Findings
Video encoder models outperform state-of-the-art video language models in breakdance classification.
Insights into selecting appropriate encoder models for dance video analysis.
Thorough analysis of fine-tuned decoder models for this niche application.
Abstract
Large Vision Language models have seen huge application in several sports use-cases recently. Most of these works have been targeted towards a limited subset of popular sports like soccer, cricket, basketball etc; focusing on generative tasks like visual question answering, highlight generation. This work analyzes the applicability of the modern video foundation models (both encoder and decoder) for a very niche but hugely popular dance sports - breakdance. Our results show that Video Encoder models continue to outperform state-of-the-art Video Language Models for prediction tasks. We provide insights on how to choose the encoder model and provide a thorough analysis into the workings of a finetuned decoder model for breakdance video classification.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition
