Video captioning with stacked attention and semantic hard pull
Md. Mushfiqur Rahman, Thasin Abedin, Khondokar S. S. Prottoy, Ayana, Moshruba, Fazlul Hasan Siddiqui

TL;DR
This paper introduces a novel video captioning architecture called SSVC that employs stacked attention and spatial hard pull to improve semantic accuracy, validated through both quantitative and qualitative evaluations.
Contribution
The paper proposes the SSVC model with innovative stacked attention and spatial hard pull mechanisms for enhanced semantic video captioning.
Findings
Improved BLEU scores over state-of-the-art models
Higher Semantic Sensibility (SS) scores in human evaluations
Effective combination of attention and hard pull techniques
Abstract
Video captioning, i.e. the task of generating captions from video sequences creates a bridge between the Natural Language Processing and Computer Vision domains of computer science. The task of generating a semantically accurate description of a video is quite complex. Considering the complexity, of the problem, the results obtained in recent research works are praiseworthy. However, there is plenty of scope for further investigation. This paper addresses this scope and proposes a novel solution. Most video captioning models comprise two sequential/recurrent layers - one as a video-to-context encoder and the other as a context-to-caption decoder. This paper proposes a novel architecture, namely Semantically Sensible Video Captioning (SSVC) which modifies the context generation mechanism by using two novel approaches - "stacked attention" and "spatial hard pull". As there are no…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
