Social Fabric: Tubelet Compositions for Video Relation Detection
Shuo Chen, Zenglin Shi, Pascal Mettes, Cees G. M. Snoek

TL;DR
This paper introduces Social Fabric, a novel encoding for classifying and detecting object relations in videos by representing pairs of tubelets as compositions of learned interaction primitives, enabling early relation modeling and improved accuracy.
Contribution
It proposes a new primitive-based encoding for object tubelet pairs and a two-stage network for relation detection, advancing video relation understanding.
Findings
Achieves state-of-the-art results on two benchmarks.
Enables query-by-primitive-example for relation search.
Demonstrates benefits of early relation modeling and primitive composition.
Abstract
This paper strives to classify and detect the relationship between object tubelets appearing within a video as a <subject-predicate-object> triplet. Where existing works treat object proposals or tubelets as single entities and model their relations a posteriori, we propose to classify and detect predicates for pairs of object tubelets a priori. We also propose Social Fabric: an encoding that represents a pair of object tubelets as a composition of interaction primitives. These primitives are learned over all relations, resulting in a compact representation able to localize and classify relations from the pool of co-occurring object tubelets across all timespans in a video. The encoding enables our two-stage network. In the first stage, we train Social Fabric to suggest proposals that are likely interacting. We use the Social Fabric in the second stage to simultaneously fine-tune and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization
