C^3Net: End-to-End deep learning for efficient real-time visual active camera control
Christos Kyrkou

TL;DR
This paper introduces C^3Net, an end-to-end deep learning model that directly maps raw visual input to camera control commands, enabling efficient, real-time active monitoring with improved performance and lower computational demands.
Contribution
The paper presents a novel deep neural network that directly controls camera movement from raw images, eliminating traditional modular pipelines and enabling real-time, resource-efficient active vision.
Findings
Achieves over 10 FPS on embedded hardware.
Outperforms traditional methods in target monitoring and active time.
Robust to varying environmental conditions.
Abstract
The need for automated real-time visual systems in applications such as smart camera surveillance, smart environments, and drones necessitates the improvement of methods for visual active monitoring and control. Traditionally, the active monitoring task has been handled through a pipeline of modules such as detection, filtering, and control. However, such methods are difficult to jointly optimize and tune their various parameters for real-time processing in resource constraint systems. In this paper a deep Convolutional Camera Controller Neural Network is proposed to go directly from visual information to camera movement to provide an efficient solution to the active vision problem. It is trained end-to-end without bounding box annotations to control a camera and follow multiple targets from raw pixel values. Evaluation through both a simulation framework and real experimental setup,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · CCD and CMOS Imaging Sensors · Advanced Image Processing Techniques
