An Intelligent AI glasses System with Multi-Agent Architecture for Real-Time Voice Processing and Task Execution

Sheng-Kai Chen; Jyh-Horng Wu; Ching-Yao Lin; Yen-Ting Lin

arXiv:2601.06235·cs.SD·January 13, 2026

An Intelligent AI glasses System with Multi-Agent Architecture for Real-Time Voice Processing and Task Execution

Sheng-Kai Chen, Jyh-Horng Wu, Ching-Yao Lin, Yen-Ting Lin

PDF

Open Access

TL;DR

This paper introduces an AI glasses system utilizing a dual-agent architecture for real-time voice recognition, AI processing, and remote task execution, supporting multilingual commands and cross-platform functionality.

Contribution

It presents a novel multi-agent framework integrating real-time voice processing, LLMs, and streaming for AI glasses, enabling efficient multilingual command handling and remote task execution.

Findings

01

Successful real-time voice command processing

02

Multilingual support demonstrated

03

Effective cross-platform task execution

Abstract

This paper presents an AI glasses system that integrates real-time voice processing, artificial intelligence(AI) agents, and cross-network streaming capabilities. The system employs dual-agent architecture where Agent 01 handles Automatic Speech Recognition (ASR) and Agent 02 manages AI processing through local Large Language Models (LLMs), Model Context Protocol (MCP) tools, and Retrieval-Augmented Generation (RAG). The system supports real-time RTSP streaming for voice and video data transmission, eye tracking data collection, and remote task execution through RabbitMQ messaging. Implementation demonstrates successful voice command processing with multilingual support and cross-platform task execution capabilities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · AI in Service Interactions · Speech Recognition and Synthesis