HiCM$^2$: Hierarchical Compact Memory Modeling for Dense Video   Captioning

Minkuk Kim; Hyeon Bae Kim; Jinyoung Moon; Jinwoo Choi; Seong Tae Kim

arXiv:2412.14585·cs.CV·December 20, 2024

HiCM$^2$: Hierarchical Compact Memory Modeling for Dense Video Captioning

Minkuk Kim, Hyeon Bae Kim, Jinyoung Moon, Jinwoo Choi, Seong Tae Kim

PDF

Open Access 1 Repo

TL;DR

This paper introduces HiCM$^2$, a hierarchical compact memory model inspired by human cognition, which enhances dense video captioning by improving memory recall and achieving state-of-the-art results on benchmark datasets.

Contribution

The paper proposes a novel hierarchical memory structure and reading module for dense video captioning, inspired by human memory hierarchy, with clustering and summarization techniques.

Findings

01

Achieves state-of-the-art performance on YouCook2 dataset.

02

Improves dense video captioning accuracy through hierarchical memory recall.

03

Demonstrates effectiveness of memory clustering and summarization methods.

Abstract

With the growing demand for solutions to real-world video challenges, interest in dense video captioning (DVC) has been on the rise. DVC involves the automatic captioning and localization of untrimmed videos. Several studies highlight the challenges of DVC and introduce improved methods utilizing prior knowledge, such as pre-training and external memory. In this research, we propose a model that leverages the prior knowledge of human-oriented hierarchical compact memory inspired by human memory hierarchy and cognition. To mimic human-like memory recall, we construct a hierarchical memory and a hierarchical memory reading module. We build an efficient hierarchical compact memory by employing clustering of memory events and summarization using large language models. Comparative experiments demonstrate that this hierarchical memory recall process improves the performance of DVC by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ailab-kyunghee/HiCM2-DVC
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization