Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands   and Objects Challenge 2022

Yin-Dong Zheng; Guo Chen; Jiahao Wang; Tong Lu; Limin Wang

arXiv:2211.08728·cs.CV·November 17, 2022

Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022

Yin-Dong Zheng, Guo Chen, Jiahao Wang, Tong Lu, Limin Wang

PDF

Open Access

TL;DR

This paper presents a method using heterogeneous backbones for classifying object state changes and localizing their temporal boundaries in videos, achieving top performance in the Ego4D challenge.

Contribution

It introduces a novel approach combining CSN and VideoMAE backbones for improved accuracy in human-object interaction understanding.

Findings

01

Achieved 0.796 accuracy on OSCC

02

Achieved 0.516 temporal localization error on PNR

03

Ranked 1st on Ego4D leaderboard

Abstract

Capturing the state changes of interacting objects is a key technology for understanding human-object interactions. This technical report describes our method using heterogeneous backbones for the Ego4D Object State Change Classification and PNR Temporal Localization Challenge. In the challenge, we used the heterogeneous video understanding backbones, namely CSN with 3D convolution as operator and VideoMAE with Transformer as operator. Our method achieves an accuracy of 0.796 on OSCC while achieving an absolute temporal localization error of 0.516 on PNR. These excellent results rank 1st on the leaderboard of Ego4D OSCC & PNR-TL Challenge 2022.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Residual Connection · Softmax · Adam