A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset
Jiaxin Deng, Dong Shen, Haojie Pan, Xiangyu Wu, Ximan Liu, Gaofeng, Meng, Fan Yang, Size Li, Ruiji Fu, Zhongyuan Wang

TL;DR
This paper introduces a new heterogeneous dataset combining multi-modal video entities and common sense relations, and proposes an end-to-end model that integrates video understanding with knowledge graph embedding to improve retrieval and inference tasks.
Contribution
It creates a novel dataset for joint video understanding and knowledge embedding, and develops a unified model that enhances content retrieval and knowledge inference performance.
Findings
Knowledge-enhanced video embeddings improve retrieval accuracy.
The model outperforms traditional KGE methods on new inference tasks.
Joint optimization benefits both video understanding and knowledge embedding.
Abstract
Video understanding is an important task in short video business platforms and it has a wide application in video recommendation and classification. Most of the existing video understanding works only focus on the information that appeared within the video content, including the video frames, audio and text. However, introducing common sense knowledge from the external Knowledge Graph (KG) dataset is essential for video understanding when referring to the content which is less relevant to the video. Owing to the lack of video knowledge graph dataset, the work which integrates video understanding and KG is rare. In this paper, we propose a heterogeneous dataset that contains the multi-modal video entity and fruitful common sense relations. This dataset also provides multiple novel video inference tasks like the Video-Relation-Tag (VRT) and Video-Relation-Video (VRV) tasks. Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
