A Surveillance Based Interactive Robot

Kshitij Kavimandan; Pooja Mangal; and Devanshi Mehta

arXiv:2508.13319·cs.RO·August 20, 2025

A Surveillance Based Interactive Robot

Kshitij Kavimandan, Pooja Mangal, and Devanshi Mehta

PDF

TL;DR

This paper presents a mobile surveillance robot that streams video, responds to speech commands, and detects objects using off-the-shelf hardware and open software, enabling real-time monitoring and interaction.

Contribution

The paper introduces a cost-effective, modular surveillance robot integrating real-time video streaming, speech interaction, and object detection using Raspberry Pi units and open-source tools.

Findings

01

Robot detects objects at interactive frame rates on CPU

02

Speech commands are recognized reliably and translated accurately

03

System demonstrates effective indoor surveillance and interaction

Abstract

We build a mobile surveillance robot that streams video in real time and responds to speech so a user can monitor and steer it from a phone or browser. The system uses two Raspberry Pi 4 units: a front unit on a differential drive base with camera, mic, and speaker, and a central unit that serves the live feed and runs perception. Video is sent with FFmpeg. Objects in the scene are detected using YOLOv3 to support navigation and event awareness. For voice interaction, we use Python libraries for speech recognition, multilingual translation, and text-to-speech, so the robot can take spoken commands and read back responses in the requested language. A Kinect RGB-D sensor provides visual input and obstacle cues. In indoor tests the robot detects common objects at interactive frame rates on CPU, recognises commands reliably, and translates them to actions without manual control. The design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.