Mamba-OTR: a Mamba-based Solution for Online Take and Release Detection from Untrimmed Egocentric Video
Alessandro Sebastiano Catinello, Giovanni Maria Farinella, Antonino Furnari

TL;DR
This paper introduces Mamba-OTR, a novel online method for detecting object take and release events in untrimmed egocentric videos, achieving high accuracy and efficiency in real-time scenarios.
Contribution
We propose Mamba-OTR, a new model leveraging the Mamba architecture with a training pipeline that handles label imbalance and aligns with evaluation metrics, improving online detection performance.
Findings
Mamba-OTR achieves mp-mAP of 45.48 in sliding-window mode.
It outperforms transformer-based approaches and vanilla Mamba in accuracy.
The method is efficient enough for real-time online applications.
Abstract
This work tackles the problem of Online detection of Take and Release (OTR) of an object in untrimmed egocentric videos. This task is challenging due to severe label imbalance, with temporally sparse positive annotations, and the need for precise temporal predictions. Furthermore, methods need to be computationally efficient in order to be deployed in real-world online settings. To address these challenges, we propose Mamba-OTR, a model based on the Mamba architecture. Mamba-OTR is designed to exploit temporal recurrence during inference while being trained on short video clips. To address label imbalance, our training pipeline incorporates the focal loss and a novel regularization scheme that aligns model predictions with the evaluation metric. Extensive experiments on EPIC-KITCHENS-100, the comparisons with transformer-based approach, and the evaluation of different training and test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimedia Communication and Technology · Digital Games and Media
