# Yet Unnoticed in LSTM: Binary Tree Based Input Reordering, Weight Regularization, and Gate Nonlinearization

**Authors:** Mojtaba Moattari

arXiv: 2509.00087 · 2025-09-03

## TL;DR

This paper introduces novel input reordering, weight normalization, and gate nonlinearization techniques for LSTMs, demonstrating improved accuracy in text classification tasks by better focusing on long-term information.

## Contribution

It proposes new methods for input reordering, weight normalization, and gate nonlinearization in LSTMs, which have not been previously explored together, to enhance long-term information modeling.

## Key findings

- Improved text classification accuracy with proposed methods.
- Optimal norm selection for weight normalization enhances model performance.
- Nonlinearized gates better capture nonlinear input relationships.

## Abstract

LSTM models used in current Machine Learning literature and applications, has a promising solution for permitting long term information using gating mechanisms that forget and reduce effect of current input information. However, even with this pipeline, they do not optimally focus on specific old index or long-term information. This paper elaborates upon input reordering approaches to prioritize certain input indices. Moreover, no LSTM based approach is found in the literature that examines weight normalization while choosing the right weight and exponent of Lp norms through main supervised loss function. In this paper, we find out which norm best finds relationship between weights to either smooth or sparsify them. Lastly, gates, as weighted representations of inputs and states, which control reduction-extent of current input versus previous inputs (~ state), are not nonlinearized enough (through a small FFNN). As analogous to attention mechanisms, gates easily filter current information to bold (emphasize on) past inputs. Nonlinearized gates can more easily tune up to peculiar nonlinearities of specific input in the past. This type of nonlinearization is not proposed in the literature, to the best of author's knowledge. The proposed approaches are implemented and compared with a simple LSTM to understand their performance in text classification tasks. The results show they improve accuracy of LSTM.

---
Source: https://tomesphere.com/paper/2509.00087