SV3.3B: A Sports Video Understanding Model for Action Recognition

Sai Varun Kodathala; Yashwanth Reddy Vutukoori; Rakesh Vunnam

arXiv:2507.17844·cs.CV·August 26, 2025

SV3.3B: A Sports Video Understanding Model for Action Recognition

Sai Varun Kodathala, Yashwanth Reddy Vutukoori, Rakesh Vunnam

PDF

Open Access

TL;DR

SV3.3B is a lightweight, efficient sports video understanding model that combines novel sampling and self-supervised learning to accurately recognize and describe athletic actions with high detail and precision.

Contribution

The paper introduces SV3.3B, a novel 3.3B parameter model that integrates temporal motion sampling and self-supervised learning for on-device sports video analysis.

Findings

01

Outperforms larger models like GPT-4o in sports description accuracy

02

Achieves 29.2% improvement in validation metrics over GPT-4o

03

Demonstrates high information density and action complexity recognition

Abstract

This paper addresses the challenge of automated sports video analysis, which has traditionally been limited by computationally intensive models requiring server-side processing and lacking fine-grained understanding of athletic movements. Current approaches struggle to capture the nuanced biomechanical transitions essential for meaningful sports analysis, often missing critical phases like preparation, execution, and follow-through that occur within seconds. To address these limitations, we introduce SV3.3B, a lightweight 3.3B parameter video understanding model that combines novel temporal motion difference sampling with self-supervised learning for efficient on-device deployment. Our approach employs a DWT-VGG16-LDA based keyframe extraction mechanism that intelligently identifies the 16 most representative frames from sports sequences, followed by a V-DWT-JEPA2 encoder pretrained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Analysis and Summarization