Improving Discrete Optimisation Via Decoupled Straight-Through Estimator
Rushi Shah, Mingyuan Yan, Michael Curtis Mozer, Dianbo Liu

TL;DR
This paper introduces Decoupled Straight-Through, a simple yet effective modification to the STE that independently tunes exploration and gradient dispersion, leading to consistent improvements across various neural discrete optimization tasks.
Contribution
Decoupled Straight-Through introduces separate temperature parameters for forward and backward passes, enabling independent optimization and surpassing existing STE variants.
Findings
Decoupled ST outperforms existing STE variants across tasks.
Optimal temperatures for forward and backward passes are significantly different.
Single-temperature methods are fundamentally limited by their combined approach.
Abstract
The Straight-Through Estimator (STE) is the dominant method for training neural networks with discrete variables, enabling gradient-based optimisation by routing gradients through a differentiable surrogate. However, existing STE variants conflate two fundamentally distinct concerns: forward-pass stochasticity, which controls exploration and latent space utilisation, and backward-pass gradient dispersion i.e how learning signals are distributed across categories. We show that these concerns are qualitatively different and that tying them to a single temperature parameter leaves significant performance gains untapped. We propose Decoupled Straight-Through (Decoupled ST), a minimal modification that introduces separate temperatures for the forward pass () and the backward pass (). This simple change enables independent tuning of exploration and gradient dispersion. Across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScheduling and Optimization Algorithms · Neural Networks and Applications
