A Nonparametric Model for Multimodal Collaborative Activities Summarization
Guy Rosman, John W. Fisher III, Daniela Rus

TL;DR
This paper introduces a Bayesian nonparametric model that combines video and GPS data to analyze and summarize collaborative human activities, especially in urban environments with noisy and incomplete data.
Contribution
The paper presents a novel nonparametric Bayesian model for integrating multimodal egocentric data to improve activity detection, classification, and summarization.
Findings
Effective activity detection and classification demonstrated
Improved handling of noisy and partial observations
Validated on synthetic and real egocentric datasets
Abstract
Ego-centric data streams provide a unique opportunity to reason about joint behavior by pooling data across individuals. This is especially evident in urban environments teeming with human activities, but which suffer from incomplete and noisy data. Collaborative human activities exhibit common spatial, temporal, and visual characteristics facilitating inference across individuals from multiple sensory modalities as we explore in this paper from the perspective of meetings. We propose a new Bayesian nonparametric model that enables us to efficiently pool video and GPS data towards collaborative activities analysis from multiple individuals. We demonstrate the utility of this model for inference tasks such as activity detection, classification, and summarization. We further demonstrate how spatio-temporal structure embedded in our model enables better understanding of partial and noisy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Advanced Text Analysis Techniques · Semantic Web and Ontologies
