EAST: Early Action Prediction Sampling Strategy with Token Masking

Iva Sovi\'c; Ivan Martinovi\'c; Marin Or\v{s}i\'c

arXiv:2604.18367·cs.CV·April 21, 2026

EAST: Early Action Prediction Sampling Strategy with Token Masking

Iva Sovi\'c, Ivan Martinovi\'c, Marin Or\v{s}i\'c

PDF

1 Video

TL;DR

EAST is an efficient framework for early action prediction that uses randomized training and token masking to improve generalization, scalability, and state-of-the-art performance across multiple datasets.

Contribution

The paper introduces a novel randomized training strategy and token masking procedure that enhance early action prediction models' generalization and scalability.

Findings

01

EAST achieves state-of-the-art accuracy on NTU60, SSv2, and UCF101 datasets.

02

Token masking reduces memory usage by half and doubles training speed with minimal accuracy loss.

03

Joint learning on observed and future representations significantly improves prediction performance.

Abstract

Early action prediction seeks to anticipate an action before it fully unfolds, but limited visual evidence makes this task especially challenging. We introduce EAST, a simple and efficient framework that enables a model to reason about incomplete observations. In our empirical study, we identify key components when training early action prediction models. Our key contribution is a randomized training strategy that samples a time step separating observed and unobserved video frames, enabling a single model to generalize seamlessly across all test-time observation ratios. We further show that joint learning on both observed and future (oracle) representations significantly boosts performance, even allowing an encoder-only model to excel. To improve scalability, we propose a token masking procedure that cuts memory usage in half and accelerates training by 2x with negligible accuracy loss.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

EAST: Early Action Prediction Sampling Strategy with Token Masking· slideslive