QueryMamba: A Mamba-Based Encoder-Decoder Architecture with a   Statistical Verb-Noun Interaction Module for Video Action Forecasting @ Ego4D   Long-Term Action Anticipation Challenge 2024

Zeyun Zhong; Manuel Martin; Frederik Diederichs; Juergen Beyerer

arXiv:2407.04184·cs.CV·July 8, 2024·1 cites

QueryMamba: A Mamba-Based Encoder-Decoder Architecture with a Statistical Verb-Noun Interaction Module for Video Action Forecasting @ Ego4D Long-Term Action Anticipation Challenge 2024

Zeyun Zhong, Manuel Martin, Frederik Diederichs, Juergen Beyerer

PDF

Open Access

TL;DR

QueryMamba introduces a novel encoder-decoder architecture with a statistical verb-noun interaction module, significantly improving video action forecasting accuracy and achieving top rankings in the Ego4D Long-Term Action Anticipation Challenge 2024.

Contribution

It proposes a new Mamba-based encoder-decoder model with a statistical verb-noun interaction module for enhanced video action prediction.

Findings

01

Achieved second place in the Ego4D LTA challenge.

02

Ranked first in noun prediction accuracy.

03

Demonstrated improved forecasting performance.

Abstract

This report presents a novel Mamba-based encoder-decoder architecture, QueryMamba, featuring an integrated verb-noun interaction module that utilizes a statistical verb-noun co-occurrence matrix to enhance video action forecasting. This architecture not only predicts verbs and nouns likely to occur based on historical data but also considers their joint occurrence to improve forecast accuracy. The efficacy of this approach is substantiated by experimental results, with the method achieving second place in the Ego4D LTA challenge and ranking first in noun prediction accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications