Symphony: A Heuristic Normalized Calibrated Advantage Actor and Critic Algorithm in application for Humanoid Robots

Timur Ishuov; Michele Folgheraiter; Madi Nurmanov; Goncalo Gordo; Rich\'ard Farkas; J\'ozsef Dombi

arXiv:2512.10477·cs.RO·March 3, 2026

Symphony: A Heuristic Normalized Calibrated Advantage Actor and Critic Algorithm in application for Humanoid Robots

Timur Ishuov, Michele Folgheraiter, Madi Nurmanov, Goncalo Gordo, Rich\'ard Farkas, J\'ozsef Dombi

PDF

Open Access 1 Models

TL;DR

The paper introduces Symphony, a novel heuristic algorithm for training humanoid robots efficiently and safely from scratch by combining regularization, limited noise, and a fading replay buffer for improved learning stability.

Contribution

It proposes a new algorithm, Symphony, that enhances sample efficiency and safety in robot learning through innovative regularization, noise control, and a combined Actor-Critic approach.

Findings

01

Empirically safer training process for humanoid robots.

02

Improved sample efficiency compared to traditional methods.

03

Effective use of fading replay buffer for stable learning.

Abstract

In our work we implicitly suggest that it is a misconception to think that humans learn fast. The learning process takes time. Babies start learning to move in the restricted fluid environment of the womb. Children are often limited by underdeveloped body. Even adults are not allowed to participate in complex competitions right away. However, with robots, when learning from scratch, we often don't have the privilege of waiting for tens of millions of steps. "Swaddling" regularization is responsible for restraining an agent in rapid but unstable development penalizing action strength in a specific way not affecting actions directly. The Symphony, Transitional-policy Deterministic Actor and Critic algorithm, is a concise combination of different ideas for possibility of training humanoid robots from scratch with Sample Efficiency, Sample Proximity and Safety of Actions in mind. It is well…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
timurgepard/symphony
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Embodied and Extended Cognition · Robotic Locomotion and Control