Motion-to-Response Content Generation via Multi-Agent AI System with Real-Time Safety Verification

HyeYoung Lee

arXiv:2601.13589·cs.AI·January 21, 2026

Motion-to-Response Content Generation via Multi-Agent AI System with Real-Time Safety Verification

HyeYoung Lee

PDF

Open Access

TL;DR

This paper presents a multi-agent AI system that generates real-time response media content from audio emotional cues, ensuring safety and appropriateness through a structured, safety-verified pipeline.

Contribution

It introduces a novel multi-agent architecture with explicit safety verification for real-time, response-oriented media content generation from emotional signals.

Findings

01

Achieved 73.2% emotion recognition accuracy

02

Reached 89.4% response mode consistency

03

Ensured 100% safety compliance

Abstract

This paper proposes a multi-agent artificial intelligence system that generates response-oriented media content in real time based on audio-derived emotional signals. Unlike conventional speech emotion recognition studies that focus primarily on classification accuracy, our approach emphasizes the transformation of inferred emotional states into safe, age-appropriate, and controllable response content through a structured pipeline of specialized AI agents. The proposed system comprises four cooperative agents: (1) an Emotion Recognition Agent with CNN-based acoustic feature extraction, (2) a Response Policy Decision Agent for mapping emotions to response modes, (3) a Content Parameter Generation Agent for producing media control parameters, and (4) a Safety Verification Agent enforcing age-appropriateness and stimulation constraints. We introduce an explicit safety verification loop…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Face recognition and analysis