Joint Event Detection and Description in Continuous Video Streams
Huijuan Xu, Boyang Li, Vasili Ramanishka, Leonid Sigal, Kate Saenko

TL;DR
This paper introduces JEDDi-Net, an end-to-end model for dense video captioning that localizes and describes events in continuous video streams, improving performance on large-scale datasets.
Contribution
The paper presents a novel joint network that simultaneously detects and captions events in videos using hierarchical captioning and shared feature encoding.
Findings
Improved results on ActivityNet Captions dataset.
First dense captioning results on TACoS-MultiLevel dataset.
Effective modeling of temporal relationships between events and captions.
Abstract
Dense video captioning is a fine-grained video understanding task that involves two sub-problems: localizing distinct events in a long video stream, and generating captions for the localized events. We propose the Joint Event Detection and Description Network (JEDDi-Net), which solves the dense video captioning task in an end-to-end fashion. Our model continuously encodes the input video stream with three-dimensional convolutional layers, proposes variable-length temporal events based on pooled features, and generates their captions. Proposal features are extracted within each proposal segment through 3D Segment-of-Interest pooling from shared video feature encoding. In order to explicitly model temporal relationships between visual events and their captions in a single video, we also propose a two-level hierarchical captioning module that keeps track of context. On the large-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
