A Layer Separation Optimization Framework for Cross-Entropy Training in Deep Learning

Yaru Liu; Michael K. Ng; Yiqi Gu

arXiv:2604.23225·cs.LG·April 28, 2026

A Layer Separation Optimization Framework for Cross-Entropy Training in Deep Learning

Yaru Liu, Michael K. Ng, Yiqi Gu

PDF

TL;DR

This paper introduces a layer separation framework that decomposes the cross-entropy training problem into simpler subproblems, improving optimization in deep neural networks.

Contribution

It proposes a novel layer separation strategy with auxiliary variables, providing theoretical bounds and efficient algorithms for better deep learning training.

Findings

01

Layer separation models decompose complex optimization problems.

02

Theoretical analysis shows the new loss bounds the original cross-entropy loss.

03

Numerical experiments demonstrate improved training performance.

Abstract

This paper investigates the deep learning optimization problem with softmax cross-entropy loss. We propose a layer separation strategy to alleviate the strong nonconvexity encountered during training deep networks. For cross-entropy models with fully connected and convolutional neural networks, we introduce auxiliary variables associated with hidden layer outputs and construct corresponding layer separation models, which decompose the original deeply nested optimization problem into a sequence of more manageable subproblems. We also conduct theoretical analyses, proving that the new layer separation loss provides an upper bound for the original cross-entropy loss. Moreover, we design alternating minimization algorithms and prove that, under appropriate conditions, these algorithms exhibit decreasing properties of the loss function. Numerical experiments validate the effectiveness of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.