Manipulated Object Proposal: A Discriminative Object Extraction and Feature Fusion Framework for First-Person Daily Activity Recognition
Changzhi Luo, Bingbing Ni, Jun Yuan, Jianfeng Wang, Shuicheng Yan,, Meng Wang

TL;DR
This paper introduces a novel framework for first-person activity recognition that uses manipulated object proposals based on motion cues to improve object feature discrimination and fuses object and motion features for better activity classification.
Contribution
The work presents a new manipulated object proposal generation scheme leveraging motion cues, enhancing object detection and feature representation in egocentric activity recognition.
Findings
Significantly outperforms state-of-the-art on a challenging benchmark.
Effective use of motion cues improves object proposal quality.
Enhanced feature fusion boosts recognition accuracy.
Abstract
Detecting and recognizing objects interacting with humans lie in the center of first-person (egocentric) daily activity recognition. However, due to noisy camera motion and frequent changes in viewpoint and scale, most of the previous egocentric action recognition methods fail to capture and model highly discriminative object features. In this work, we propose a novel pipeline for first-person daily activity recognition, aiming at more discriminative object feature representation and object-motion feature fusion. Our object feature extraction and representation pipeline is inspired by the recent success of object hypotheses and deep convolutional neural network based detection frameworks. Our key contribution is a simple yet effective manipulated object proposal generation scheme. This scheme leverages motion cues such as motion boundary and motion magnitude (in contrast, camera motion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Context-Aware Activity Recognition Systems · Video Surveillance and Tracking Methods
