Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Heeseung Yun, Ruohan Gao, Ishwarya Ananthabhotla, Anurag Kumar, Jacob, Donley, Chao Li, Gunhee Kim, Vamsi Krishna Ithapu, Calvin Murdock

TL;DR
This paper introduces Spherical World-Locking (SWL), a novel egocentric scene representation framework that improves multisensory spatial synchronization by transforming data with respect to head orientation on a sphere, enhancing understanding of egocentric videos.
Contribution
The paper presents SWL, a new spherical world-locked representation that better handles self-motion and multisensory data alignment in egocentric videos, along with a transformer-based architecture for scene understanding.
Findings
SWL improves spatial synchronization across modalities.
The framework enhances performance on egocentric video tasks.
It effectively handles self-motion challenges in multisensory data.
Abstract
Egocentric videos provide comprehensive contexts for user and scene understanding, spanning multisensory perception to behavioral interaction. We propose Spherical World-Locking (SWL) as a general framework for egocentric scene representation, which implicitly transforms multisensory streams with respect to measurements of head orientation. Compared to conventional head-locked egocentric representations with a 2D planar field-of-view, SWL effectively offsets challenges posed by self-motion, allowing for improved spatial synchronization between input modalities. Using a set of multisensory embeddings on a worldlocked sphere, we design a unified encoder-decoder transformer architecture that preserves the spherical structure of the scene representation, without requiring expensive projections between image and world coordinate systems. We evaluate the effectiveness of the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Music Technology and Sound Studies
MethodsSparse Evolutionary Training
