Detection-Fusion for Knowledge Graph Extraction from Videos
Taniya Das, Louis Mahon, Thomas Lukasiewicz

TL;DR
This paper introduces a deep learning method for extracting knowledge graphs from videos, aiming to improve semantic understanding without relying on natural language descriptions.
Contribution
The paper presents a novel deep learning approach for video annotation with knowledge graphs and extends it to incorporate background knowledge.
Findings
Effective prediction of individual pairs in videos
Successful relation prediction between entities
Inclusion of background knowledge enhances graph accuracy
Abstract
One of the challenging tasks in the field of video understanding is extracting semantic content from video inputs. Most existing systems use language models to describe videos in natural language sentences, but this has several major shortcomings. Such systems can rely too heavily on the language model component and base their output on statistical regularities in natural language text rather than on the visual contents of the video. Additionally, natural language annotations cannot be readily processed by a computer, are difficult to evaluate with performance metrics and cannot be easily translated into a different natural language. In this paper, we propose a method to annotate videos with knowledge graphs, and so avoid these problems. Specifically, we propose a deep-learning-based model for this task that first predicts pairs of individuals and then the relations between them.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Cognitive Computing and Networks · Graph Theory and Algorithms
MethodsBalanced Selection
