Align Forward, Adapt Backward: Closing the Discretization Gap in Logic Gate Networks

Youngsung Kim

arXiv:2603.14157·cs.LG·March 17, 2026

Align Forward, Adapt Backward: Closing the Discretization Gap in Logic Gate Networks

Youngsung Kim

PDF

Open Access

TL;DR

This paper investigates the training-inference mismatch in logic gate networks caused by different selection methods, analyzing their behaviors and introducing CAGE to improve gradient flow and accuracy.

Contribution

The paper introduces CAGE, a novel gradient estimation method that reduces the discretization gap and improves accuracy in logic gate networks.

Findings

01

Hard-ST achieves zero selection gap by construction.

02

Gumbel-ST suffers accuracy collapse at low temperatures.

03

CAGE maintains gradient flow and achieves high accuracy.

Abstract

In neural network models, soft mixtures of fixed candidate components (e.g., logic gates and sub-networks) are often used during training for stable optimization, while hard selection is typically used at inference. This raises questions about training-inference mismatch. We analyze this gap by separating forward-pass computation (hard selection vs. soft mixture) from stochasticity (with vs. without Gumbel noise). Using logic gate networks as a testbed, we observe distinct behaviors across four methods: Hard-ST achieves zero selection gap by construction; Gumbel-ST achieves near-zero gap when training succeeds but suffers accuracy collapse at low temperatures; Soft-Mix achieves small gap only at low temperature via weight concentration; and Soft-Gumbel exhibits large gaps despite Gumbel noise, confirming that noise alone does not reduce the gap. We propose CAGE (Confidence-Adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Reservoir Computing · Stochastic Gradient Optimization Techniques