Online Localization and Prediction of Actions and Interactions
Khurram Soomro, Haroon Idrees, and Mubarak Shah

TL;DR
This paper introduces an online, person-centric method for real-time localization and prediction of actions and interactions in videos, addressing the limitations of offline approaches by updating models continuously.
Contribution
It presents a novel online framework combining pose estimation, appearance modeling, and structured SVM-based prediction for timely action detection and forecasting.
Findings
Achieves competitive accuracy with offline methods using only a few frames
Effectively handles pose estimation noise and visual drift online
Provides a unified approach for detection and prediction in real-time
Abstract
This paper proposes a person-centric and online approach to the challenging problem of localization and prediction of actions and interactions in videos. Typically, localization or recognition is performed in an offline manner where all the frames in the video are processed together. This prevents timely localization and prediction of actions and interactions - an important consideration for many tasks including surveillance and human-machine interaction. In our approach, we estimate human poses at each frame and train discriminative appearance models using the superpixels inside the pose bounding boxes. Since the pose estimation per frame is inherently noisy, the conditional probability of pose hypotheses at current time-step (frame) is computed using pose estimations in the current frame and their consistency with poses in the previous frames. Next, both the superpixel and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSupport Vector Machine
