Visual Perception Engine: Fast and Flexible Multi-Head Inference for Robotic Vision Tasks
Jakub {\L}ucki, Jonathan Becktor, Georgios Georgakis, Rob Royce, Shehryar Khattak

TL;DR
This paper introduces VPEngine, a modular framework that enables efficient multi-task visual perception on resource-constrained robots by sharing a backbone model across tasks, reducing redundancy, and optimizing GPU usage for real-time performance.
Contribution
The work presents a novel modular framework that shares a backbone model across multiple perception tasks, reducing computation and memory redundancy while enabling dynamic task prioritization.
Findings
Achieves up to 3x speedup over sequential execution.
Maintains constant memory footprint during multitasking.
Real-time performance at ≥50 Hz on NVIDIA Jetson Orin AGX.
Abstract
Deploying multiple machine learning models on resource-constrained robotic platforms for different perception tasks often results in redundant computations, large memory footprints, and complex integration challenges. In response, this work presents Visual Perception Engine (VPEngine), a modular framework designed to enable efficient GPU usage for visual multitasking while maintaining extensibility and developer accessibility. Our framework architecture leverages a shared foundation model backbone that extracts image representations, which are efficiently shared, without any unnecessary GPU-CPU memory transfers, across multiple specialized task-specific model heads running in parallel. This design eliminates the computational redundancy inherent in feature extraction component when deploying traditional sequential models while enabling dynamic task prioritization based on application…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
