Human-in-the-loop Adaptation in Group Activity Feature Learning for Team Sports Video Retrieval
Chihiro Nakatani, Hiroaki Kawashima, Norimichi Ukita

TL;DR
This paper introduces a human-in-the-loop adaptation method for group activity feature learning in team sports video retrieval, enhancing performance without requiring explicit activity annotations through interactive fine-tuning and contrastive learning.
Contribution
It presents a novel self-supervised pre-training approach combined with interactive fine-tuning for improved video retrieval in sports, eliminating the need for labeled group activity data.
Findings
Significant improvement in retrieval accuracy on sports datasets.
Effective use of contrastive learning with user-labeled videos.
Component ablation confirms each part's contribution to performance.
Abstract
This paper proposes human-in-the-loop adaptation for Group Activity Feature Learning (GAFL) without group activity annotations. This human-in-the-loop adaptation is employed in a group-activity video retrieval framework to improve its retrieval performance. Our method initially pre-trains the GAF space based on the similarity of group activities in a self-supervised manner, unlike prior work that classifies videos into pre-defined group activity classes in a supervised learning manner. Our interactive fine-tuning process updates the GAF space to allow a user to better retrieve videos similar to query videos given by the user. In this fine-tuning, our proposed data-efficient video selection process provides several videos, which are selected from a video database, to the user in order to manually label these videos as positive or negative. These labeled videos are used to update (i.e.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Multimodal Machine Learning Applications
