Align Forward, Adapt Backward: Closing the Discretization Gap in Logic Gate Networks
Youngsung Kim

TL;DR
This paper investigates the training-inference mismatch in logic gate networks caused by different selection methods, analyzing their behaviors and introducing CAGE to improve gradient flow and accuracy.
Contribution
The paper introduces CAGE, a novel gradient estimation method that reduces the discretization gap and improves accuracy in logic gate networks.
Findings
Hard-ST achieves zero selection gap by construction.
Gumbel-ST suffers accuracy collapse at low temperatures.
CAGE maintains gradient flow and achieves high accuracy.
Abstract
In neural network models, soft mixtures of fixed candidate components (e.g., logic gates and sub-networks) are often used during training for stable optimization, while hard selection is typically used at inference. This raises questions about training-inference mismatch. We analyze this gap by separating forward-pass computation (hard selection vs. soft mixture) from stochasticity (with vs. without Gumbel noise). Using logic gate networks as a testbed, we observe distinct behaviors across four methods: Hard-ST achieves zero selection gap by construction; Gumbel-ST achieves near-zero gap when training succeeds but suffers accuracy collapse at low temperatures; Soft-Mix achieves small gap only at low temperature via weight concentration; and Soft-Gumbel exhibits large gaps despite Gumbel noise, confirming that noise alone does not reduce the gap. We propose CAGE (Confidence-Adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Reservoir Computing · Stochastic Gradient Optimization Techniques
