Resource-Efficient Multiview Perception: Integrating Semantic Masking with Masked Autoencoders

Kosta Dakic; Kanchana Thilakarathna; Rodrigo N. Calheiros; Teng Joon Lim

arXiv:2410.04817·cs.CV·November 11, 2025

Resource-Efficient Multiview Perception: Integrating Semantic Masking with Masked Autoencoders

Kosta Dakic, Kanchana Thilakarathna, Rodrigo N. Calheiros, Teng Joon Lim

PDF

Open Access

TL;DR

This paper introduces a semantic-guided masking strategy combined with masked autoencoders to improve communication efficiency in multiview perception systems, maintaining high detection and tracking accuracy while reducing data transmission.

Contribution

The novel semantic-guided masking approach integrated with MAEs enhances resource efficiency in multiview perception, outperforming random masking in accuracy and data reduction.

Findings

01

Achieves comparable detection and tracking performance at high masking ratios.

02

Reduces transmission data volume significantly compared to baseline methods.

03

Selective masking outperforms random masking in accuracy and efficiency.

Abstract

Multiview systems have become a key technology in modern computer vision, offering advanced capabilities in scene understanding and analysis. However, these systems face critical challenges in bandwidth limitations and computational constraints, particularly for resource-limited camera nodes like drones. This paper presents a novel approach for communication-efficient distributed multiview detection and tracking using masked autoencoders (MAEs). We introduce a semantic-guided masking strategy that leverages pre-trained segmentation models and a tunable power function to prioritize informative image regions. This approach, combined with an MAE, reduces communication overhead while preserving essential visual information. We evaluate our method on both virtual and real-world multiview datasets, demonstrating comparable performance in terms of detection and tracking performance metrics…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques

MethodsMasked autoencoder