Perception for Autonomous Systems (PAZ)
Octavio Arriaga, Matias Valdenegro-Toro, Mohandass Muthuraja, Sushma, Devaramani, Frank Kirchner

TL;DR
The paper introduces PAZ, a hierarchical perception software library for autonomous systems that enables flexible, modular processing pipelines for various perception tasks using machine learning models.
Contribution
It presents a novel hierarchical abstraction framework in PAZ, facilitating customizable perception pipelines for diverse autonomous system applications.
Findings
Supports multiple perception tasks including 2D/3D detection and pose estimation.
Enables reusable training and prediction pipelines.
Provides hierarchical modular design for flexible perception processing.
Abstract
In this paper we introduce the Perception for Autonomous Systems (PAZ) software library. PAZ is a hierarchical perception library that allow users to manipulate multiple levels of abstraction in accordance to their requirements or skill level. More specifically, PAZ is divided into three hierarchical levels which we refer to as pipelines, processors, and backends. These abstractions allows users to compose functions in a hierarchical modular scheme that can be applied for preprocessing, data-augmentation, prediction and postprocessing of inputs and outputs of machine learning (ML) models. PAZ uses these abstractions to build reusable training and prediction pipelines for multiple robot perception tasks such as: 2D keypoint estimation, 2D object detection, 3D keypoint discovery, 6D pose estimation, emotion classification, face recognition, instance segmentation, and attention mechanisms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
