Learning what to say and how precisely: Efficient Communication via Differentiable Discrete Communication Learning

Aditya Kapoor; Yash Bhisikar; Benjamin Freed; Jan Peters; Mingfei Sun

arXiv:2511.01554·cs.MA·November 4, 2025

Learning what to say and how precisely: Efficient Communication via Differentiable Discrete Communication Learning

Aditya Kapoor, Yash Bhisikar, Benjamin Freed, Jan Peters, Mingfei Sun

PDF

Open Access 4 Reviews

TL;DR

This paper introduces an extension of Differentiable Discrete Communication Learning (DDCL) that supports unbounded signals, enabling efficient, adaptive communication in multi-agent reinforcement learning with reduced bandwidth and competitive performance.

Contribution

The paper extends DDCL to support unbounded signals, creating a universal layer for MARL that dynamically modulates message precision and reduces bandwidth without sacrificing performance.

Findings

01

Agents learn to adapt message precision based on task needs.

02

Bandwidth is reduced by over an order of magnitude while maintaining or improving performance.

03

A simple Transformer-based policy with DDCL matches complex architectures, questioning the need for specialized designs.

Abstract

Effective communication in multi-agent reinforcement learning (MARL) is critical for success but constrained by bandwidth, yet past approaches have been limited to complex gating mechanisms that only decide \textit{whether} to communicate, not \textit{how precisely}. Learning to optimize message precision at the bit-level is fundamentally harder, as the required discretization step breaks gradient flow. We address this by generalizing Differentiable Discrete Communication Learning (DDCL), a framework for end-to-end optimization of discrete messages. Our primary contribution is an extension of DDCL to support unbounded signals, transforming it into a universal, plug-and-play layer for any MARL architecture. We verify our approach with three key results. First, through a qualitative analysis in a controlled environment, we demonstrate \textit{how} agents learn to dynamically modulate…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 6Confidence 3

Strengths

- Originality First, the generalization of the DDCL framework itself is a crucial technical step. Prior DDCL work was limited to positive, bounded signals, which imposed artificial architectural constraints (like requiring a sigmoid activation on policy outputs). This generalization to unbounded, signed, real-valued vectors removes those constraints, fundamentally changing DDCL from a niche technique to a universal, plug-and-play module for any MARL architecture. The authors back this up by pr

Weaknesses

- Originality The framework's core novelty is arguably limited to the boundary conditions of the DDCL formulation. The foundational innovation was established in prior work (Freed et al., 2020c). The generalization to unbounded signals is crucial for applicability, but the mechanism itself is inherited. A more significant constraint on the originality is the sub-optimality of the derived communication cost, $L_{comms}$. The paper acknowledges that $L_{comms}$ is a surrogate loss derived using J

Reviewer 02Rating 2Confidence 4

Strengths

1. The idea and the relaxation of the assumption in communication seems to be interesting. 2. The proof sounds correct. 3. The analysis is extensive and pretty interesting.

Weaknesses

However, I found the paper has shortcomings in: 1) the core issues (see my detailed comments below) when relaxing the assumption are not considered and addressed; 2) the theoretical analysis seems to have marginal modification from Freed's work [1,2]; 3) the claims in the experiments are too strong, which is not sufficiently evidenced by the results. [1] Discrete communication learning via backpropagation on bandwidth-limited communication networks. Master’s Thesis, Carnegie Mellon University,

Reviewer 03Rating 8Confidence 3

Strengths

- **Clear conceptual improvement.** The paper makes a clean and well-motivated generalization of Freed et al.’s DDCL to unbounded real-valued signals, removing the restrictive assumption $z \in [0,1]$. This substantially broadens applicability to a wide range of MARL architectures. - **Strong experimental validation.** The evaluation suite is extensive, covering both controlled and high-dimensional domains. Integrations into four established MARL+Comms baselines are convincing

Weaknesses

- **Clarity and presentation.** Figures are visually dense and captions occasionally unclear and confusing. - *Figure 1:* The caption is not well aligned with the figure. The term *“episodic plot”* is unclear, and it is not evident where “success rate remains perfect (1.0).” - *Figure 2:* The numerous STE baselines (`STE_[4,8,16]`) clutter the Pareto plots; consider reducing them or highlighting key configurations more clearly. The Pareto frontier is said to be “indicated by thick bl

Reviewer 04Rating 8Confidence 2

Strengths

1.Core Technical Contribution: A theoretically grounded and practically validated generalization of DDCL to unbounded signals. 2.Plug-and-Play Utility: Can be seamlessly integrated into multiple existing MARL algorithms without major architectural changes. 3.Clear Differentiable Objective: Well-justified communication cost term derived from information-theoretic principles. 4.Strong Empirical Results: Demonstrates bandwidth reduction of up to 10000× with competitive or superior task performance.

Weaknesses

The paper assumes synchronized randomness between agents to reconstruct discrete messages. It would strengthen the work to analyze how DDCL performs under desynchronized or noisy communication conditions. While the current DDCL design uses a fixed uniform grid, future work could explore learned non-uniform quantization or variational encoding for further compression efficiency. The empirical evaluation could be further enhanced by testing DDCL under more dynamic or heterogeneous communication

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Age of Information Optimization · Neural Networks and Reservoir Computing