Domain Adaptation of VLM for Soccer Video Understanding
Tiancheng Jiang, Henry Wang, Md Sirajus Salekin, Parmida Atighehchian, Shinan Zhang

TL;DR
This paper explores adapting open-source Vision Language Models to soccer video understanding by fine-tuning with large-scale soccer datasets, significantly improving performance on soccer-specific tasks.
Contribution
It introduces a curriculum learning approach for domain adaptation of VLMs to soccer videos using large datasets and instruction-following data, which is a novel application.
Findings
37.5% relative improvement in soccer VQA
Accuracy increase from 11.8% to 63.5% in soccer action classification
Effective domain-specific adaptation of VLMs for sports videos
Abstract
Vision Language Models (VLMs) have demonstrated strong performance in multi-modal tasks by effectively aligning visual and textual representations. However, most video understanding VLM research has been domain-agnostic, leaving the understanding of their transfer learning capability to specialized domains under-explored. In this work, we address this by exploring the adaptability of open-source VLMs to specific domains, and focusing on soccer as an initial case study. Our approach uses large-scale soccer datasets and LLM to create instruction-following data, and use them to iteratively fine-tune the general-domain VLM in a curriculum learning fashion (first teaching the model key soccer concepts to then question answering tasks). The final adapted model, trained using a curated dataset of 20k video clips, exhibits significant improvement in soccer-specific tasks compared to the base…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Video Analysis and Summarization · Human Pose and Action Recognition
MethodsBalanced Selection
