Deep RNN Framework for Visual Sequential Applications
Bo Pang, Kaiwen Zha, Hanwen Cao, Chen Shi, Cewu Lu

TL;DR
This paper introduces a deep RNN framework with novel modules and training schemes that significantly improve performance on visual sequence tasks, demonstrating ease of training and substantial accuracy gains.
Contribution
The paper proposes a new deep RNN architecture with the Context Bridge Module and Overlap Coherence Training Scheme, enabling effective training of very deep RNNs for visual sequences.
Findings
Achieves over 11% relative improvement in video classification accuracy.
Improves auxiliary annotation performance by 14.7%.
Enhances video prediction metrics by 2.4% on PSNR and SSIM.
Abstract
Extracting temporal and representation features efficiently plays a pivotal role in understanding visual sequence information. To deal with this, we propose a new recurrent neural framework that can be stacked deep effectively. There are mainly two novel designs in our deep RNN framework: one is a new RNN module called Context Bridge Module (CBM) which splits the information flowing along the sequence (temporal direction) and along depth (spatial representation direction), making it easier to train when building deep by balancing these two directions; the other is the Overlap Coherence Training Scheme that reduces the training complexity for long visual sequential tasks on account of the limitation of computing resources. We provide empirical evidence to show that our deep RNN framework is easy to optimize and can gain accuracy from the increased depth on several visual sequence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
