F-Actor: Controllable Conversational Behaviour in Full-Duplex Models
Maike Z\"ufle, Ondrej Klejch, Nicholas Sanders, Jan Niehues, Alexandra Birch, Tsz Kin Lam

TL;DR
This paper introduces F-Actor, an open-source, instruction-following full-duplex conversational speech model that can be efficiently trained to produce natural, controllable dialogue behavior with minimal data and resources.
Contribution
It presents the first open, instruction-following full-duplex speech model that is efficiently trainable with limited data and allows explicit control over conversational aspects.
Findings
Model can follow instructions to control speaker voice and topic.
Requires only 2,000 hours of data and keeps the audio encoder frozen.
Code and model are publicly released for reproducible research.
Abstract
Spoken conversational systems require more than accurate speech generation to have human-like conversations: to feel natural and engaging, they must produce conversational behaviour that adapts dynamically to the context. Current spoken conversational systems, however, rarely allow such customization, limiting their naturalness and usability. In this work, we present the first open, instruction-following full-duplex conversational speech model that can be trained efficiently under typical academic resource constraints. By keeping the audio encoder frozen and finetuning only the language model, our model requires just 2,000 hours of data, without relying on large-scale pretraining or multi-stage optimization. The model can follow explicit instructions to control speaker voice, conversation topic, conversational behaviour (e.g., backchanneling and interruptions), and dialogue initiation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
