Multi-Modal Soccer Scene Analysis with Masked Pre-Training
Marc Peral, Guillem Capellera, Luis Ferraz, Antonio Rubio, Antonio Agudo

TL;DR
This paper presents a multi-modal transformer-based approach for analyzing soccer scenes, accurately inferring ball trajectories, states, and possessors from noisy, real-world footage without relying on precise ball tracking.
Contribution
It introduces a novel multi-modal architecture with a CropDrop pre-training strategy, enabling robust soccer scene analysis without explicit ball tracking or handcrafted heuristics.
Findings
Significant improvements over state-of-the-art baselines in all tasks
Effective inference of ball trajectory without direct position data
Robust identification of ball state and possessor under occlusion
Abstract
In this work we propose a multi-modal architecture for analyzing soccer scenes from tactical camera footage, with a focus on three core tasks: ball trajectory inference, ball state classification, and ball possessor identification. To this end, our solution integrates three distinct input modalities (player trajectories, player types and image crops of individual players) into a unified framework that processes spatial and temporal dynamics using a cascade of sociotemporal transformer blocks. Unlike prior methods, which rely heavily on accurate ball tracking or handcrafted heuristics, our approach infers the ball trajectory without direct access to its past or future positions, and robustly identifies the ball state and ball possessor under noisy or occluded conditions from real top league matches. We also introduce CropDrop, a modality-specific masking pre-training strategy that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Human Pose and Action Recognition · Anomaly Detection Techniques and Applications
