Video Imprint
Zhanning Gao, Le Wang, Nebojsa Jojic, Zhenxing Niu, Nanning Zheng,, Gang Hua

TL;DR
The paper introduces ER3, a unified video analytics framework utilizing the novel video imprint representation to improve event recognition, recounting, and retrieval by exploiting temporal correlations and attention mechanisms.
Contribution
It proposes the video imprint representation and an integrated framework for event recognition, recounting, and retrieval, with a reasoning network capable of localization and evidence highlighting.
Findings
Enhanced event retrieval accuracy over state-of-the-art methods
Effective localization of key frames and areas within videos
Improved event recounting with evidence highlighting
Abstract
A new unified video analytics framework (ER3) is proposed for complex event retrieval, recognition and recounting, based on the proposed video imprint representation, which exploits temporal correlations among image features across video frames. With the video imprint representation, it is convenient to reverse map back to both temporal and spatial locations in video frames, allowing for both key frame identification and key areas localization within each frame. In the proposed framework, a dedicated feature alignment module is incorporated for redundancy removal across frames to produce the tensor representation, i.e., the video imprint. Subsequently, the video imprint is individually fed into both a reasoning network and a feature aggregation module, for event recognition/recounting and event retrieval tasks, respectively. Thanks to its attention mechanism inspired by the memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
