VHAKG: A Multi-modal Knowledge Graph Based on Synchronized Multi-view   Videos of Daily Activities

Shusaku Egami; Takahiro Ugai; Swe Nwe Nwe Htun; Ken Fukuda

arXiv:2408.14895·cs.AI·August 29, 2024

VHAKG: A Multi-modal Knowledge Graph Based on Synchronized Multi-view Videos of Daily Activities

Shusaku Egami, Takahiro Ugai, Swe Nwe Nwe Htun, Ken Fukuda

PDF

2 Repos

TL;DR

This paper introduces VHAKG, a multi-modal knowledge graph built from synchronized multi-view videos of daily activities, capturing detailed event and frame-level information to support knowledge processing and model benchmarking.

Contribution

The paper presents a novel MMKG constructed from synchronized multi-view videos, including fine-grained frame details and tools for querying, advancing multi-modal knowledge graph construction.

Findings

01

Facilitates benchmarking vision-language models.

02

Includes detailed frame-by-frame changes.

03

Supports querying and knowledge processing.

Abstract

Multi-modal knowledge graphs (MMKGs), which ground various non-symbolic data (e.g., images and videos) into symbols, have attracted attention as resources enabling knowledge processing and machine learning across modalities. However, the construction of MMKGs for videos consisting of multiple events, such as daily activities, is still in the early stages. In this paper, we construct an MMKG based on synchronized multi-view simulated videos of daily activities. Besides representing the content of daily life videos as event-centric knowledge, our MMKG also includes frame-by-frame fine-grained changes, such as bounding boxes within video frames. In addition, we provide support tools for querying our MMKG. As an application example, we demonstrate that our MMKG facilitates benchmarking vision-language models by providing the necessary vision-language datasets for a tailored task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need