ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models

Chi-Yuan Hsiao; Ke-Han Lu; Yu-Kuan Fu; Guan-Ting Lin; Hsiao-Tsung Hung; Hung-yi Lee

arXiv:2604.10065·cs.CL·April 14, 2026

ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models

Chi-Yuan Hsiao, Ke-Han Lu, Yu-Kuan Fu, Guan-Ting Lin, Hsiao-Tsung Hung, Hung-yi Lee

PDF

TL;DR

ASPIRin introduces a novel reinforcement learning framework for full-duplex speech language models that improves turn-taking and interactivity without sacrificing semantic quality.

Contribution

It proposes Action Space Projection and Group Relative Policy Optimization to decouple speaking timing from content, enhancing interactivity in speech models.

Findings

01

Optimizes turn-taking, backchanneling, and pause handling.

02

Reduces duplicate n-grams by over 50%.

03

Eliminates degenerative repetition.

Abstract

End-to-end full-duplex Speech Language Models (SLMs) require precise turn-taking for natural interaction. However, optimizing temporal dynamics via standard raw-token reinforcement learning (RL) degrades semantic quality, causing severe generative collapse and repetition. We propose ASPIRin, an interactivity-optimized RL framework that explicitly decouples when to speak from what to say. Using Action Space Projection, ASPIRin maps the text vocabulary into a coarse-grained binary state (active speech vs. inactive silence). By applying Group Relative Policy Optimization (GRPO) with rule-based rewards, it balances user interruption and response latency. Empirical evaluations show ASPIRin optimizes interactivity across turn-taking, backchanneling, and pause handling. Crucially, isolating timing from token selection preserves semantic coherence and reduces the portion of duplicate n-grams by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.