Improved Dense Trajectory with Cross Streams
Katsunori Ohnishi, Masatoshi Hidaka, Tatsuya Harada

TL;DR
This paper introduces a novel local descriptor that combines appearance and motion streams through cross-network pooling, improving action recognition accuracy by emphasizing discriminative regions and reducing background noise.
Contribution
It proposes a new descriptor that pools convolutional layers from crossing two networks, enhancing the discriminative power for action recognition.
Findings
Achieved 92.3% accuracy on UCF101
Achieved 66.2% accuracy on HMDB51
Outperforms previous state-of-the-art methods
Abstract
Improved dense trajectories (iDT) have shown great performance in action recognition, and their combination with the two-stream approach has achieved state-of-the-art performance. It is, however, difficult for iDT to completely remove background trajectories from video with camera shaking. Trajectories in less discriminative regions should be given modest weights in order to create more discriminative local descriptors for action recognition. In addition, the two-stream approach, which learns appearance and motion information separately, cannot focus on motion in important regions when extracting features from spatial convolutional layers of the appearance network, and vice versa. In order to address the above mentioned problems, we propose a new local descriptor that pools a new convolutional layer obtained from crossing two networks along iDT. This new descriptor is calculated by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Video Surveillance and Tracking Methods
