Multi-Perspective LSTM for Joint Visual Representation Learning

Alireza Sepas-Moghaddam; Fernando Pereira; Paulo Lobato Correia; Ali; Etemad

arXiv:2105.02802·cs.CV·May 7, 2021

Multi-Perspective LSTM for Joint Visual Representation Learning

Alireza Sepas-Moghaddam, Fernando Pereira, Paulo Lobato Correia, Ali, Etemad

PDF

TL;DR

This paper introduces a new LSTM cell architecture designed for learning complex relationships in multi-perspective visual sequences, improving recognition accuracy in tasks like lip reading and face recognition.

Contribution

The paper proposes a novel recurrent joint learning strategy with additional gates and memories, enhancing visual representation learning from multiple perspectives.

Findings

01

Superior recognition accuracy over benchmarks

02

Effective learning of intra- and inter-perspective relationships

03

Reduced complexity compared to existing methods

Abstract

We present a novel LSTM cell architecture capable of learning both intra- and inter-perspective relationships available in visual sequences captured from multiple perspectives. Our architecture adopts a novel recurrent joint learning strategy that uses additional gates and memories at the cell level. We demonstrate that by using the proposed cell to create a network, more effective and richer visual representations are learned for recognition tasks. We validate the performance of our proposed architecture in the context of two multi-perspective visual recognition tasks namely lip reading and face recognition. Three relevant datasets are considered and the results are compared against fusion strategies, other existing multi-input LSTM architectures, and alternative recognition solutions. The experiments show the superior performance of our solution over the considered benchmarks, both in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory