CARLOR @ Ego4D Step Grounding Challenge: Bayesian temporal-order priors for test time refinement
Carlos Plou, Lorenzo Mur-Labadia, Ruben Martinez-Cantin, Ana, C.Murillo

TL;DR
This paper presents a Bayesian-VSLNet model that improves step grounding in egocentric videos by incorporating a Bayesian temporal-order prior, achieving state-of-the-art accuracy on the Ego4D dataset.
Contribution
The introduction of a Bayesian temporal-order prior into VSLNet for test-time refinement is a novel approach for better temporal boundary detection in untrimmed videos.
Findings
Achieved 35.18% Recall Top-1 at 0.3 IoU on Ego4D dataset.
Achieved 20.48% Recall Top-1 at 0.5 IoU on Ego4D dataset.
Outperformed existing methods with significant accuracy improvements.
Abstract
The goal of the Step Grounding task is to locate temporal boundaries of activities based on natural language descriptions. This technical report introduces a Bayesian-VSLNet to address the challenge of identifying such temporal segments in lengthy, untrimmed egocentric videos. Our model significantly improves upon traditional models by incorporating a novel Bayesian temporal-order prior during inference, enhancing the accuracy of moment predictions. This prior adjusts for cyclic and repetitive actions within videos. Our evaluations demonstrate superior performance over existing methods, achieving state-of-the-art results on the Ego4D Goal-Step dataset with a 35.18 Recall Top-1 at 0.3 IoU and 20.48 Recall Top-1 at 0.5 IoU on the test set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Time Series Analysis and Forecasting · Natural Language Processing Techniques
