Improving Neural Networks by Adopting Amplifying and Attenuating Neurons

Seongmun Jung; Oh Joon Kwon

arXiv:1905.09574·cs.NE·May 28, 2019

Improving Neural Networks by Adopting Amplifying and Attenuating Neurons

Seongmun Jung, Oh Joon Kwon

PDF

Open Access 1 Repo

TL;DR

This paper introduces simple amplifying and attenuating neurons that can be integrated into neural networks to improve their performance without significant computational costs, supported by theoretical and experimental evidence.

Contribution

It proposes novel neuron types that enhance neural network capabilities and demonstrates their effectiveness through theoretical analysis and numerical experiments.

Findings

01

Amplifying neurons increase neural network order.

02

Networks with these neurons achieve higher accuracy.

03

Performance improvements are validated experimentally.

Abstract

In the present study, an amplifying neuron and attenuating neuron, which can be easily implemented into neural networks without any significant additional computational effort, are proposed. The activated output value is squared for the amplifying neuron, while the value becomes its reciprocal for the attenuating one. Theoretically, the order of neural networks increases when the amplifying neuron is placed in the hidden layer. The performance assessments of neural networks were conducted to verify that the amplifying and attenuating neurons enhance the performance of neural networks. From the numerical experiments, it was revealed that the neural networks that contain the amplifying and attenuating neurons yield more accurate results, compared to those without them.

Tables2

Table 1. Table 1 : The MAE and SD of the error from the neural networks according to the network configurations.

Network	Depth	$n_{total neurons}$	$n_{amplifying}$	$n_{attenuating}$	MAE	SD
		Per single hidden layer			Error ( $x \in [0, 2 π]$ )
Network 1	5	10	0	0	0.085079	0.102091
Network 2	5	10	1	0	0.048816	0.066942
Network 3	5	10	0	1	0.072480	0.090168
Network 4	5	10	5	0	0.005758	0.013647
Network 5	5	10	4	1	0.005485	0.008369
Network 6	5	10	0	5	0.013222	0.023051
Network 7	9	10	0	0	0.003283	0.006276
Network 8	9	10	4	1	0.002212	0.005303

Table 2. Table 2 : The neural network performance comparisons by network configurations.

Network	Depth	$n_{total neurons}$	$n_{amplifying}$	$n_{attenuating}$	MAE	SD
		Per single hidden layer			Error ( $x, y \in [- 3, 3]$ )
Network 9	6	10	0	0	0.238485	0.382490
Network 10	6	10	3	1	0.056000	0.089219

Equations14

y^{k + 1} = F (W^{k} y^{k})

y^{k + 1} = F (W^{k} y^{k})

{\int f (x) d x = \nicefrac 12 {f (x)}^{2} (f (x) = ReLU) \int f (x) d x \approx \nicefrac 12 {f (x)}^{2} (f (x) = softplus)

{\int f (x) d x = \nicefrac 12 {f (x)}^{2} (f (x) = ReLU) \int f (x) d x \approx \nicefrac 12 {f (x)}^{2} (f (x) = softplus)

{F (x) = G \circ H (x) G (x) = x^{2}

{F (x) = G \circ H (x) G (x) = x^{2}

⎩ ⎨ ⎧ F (x) = G \circ H (x) G (x) = \frac{x}{x ^{2} + b} \approx \frac{1}{x}

⎩ ⎨ ⎧ F (x) = G \circ H (x) G (x) = \frac{x}{x ^{2} + b} \approx \frac{1}{x}

f (x) = a x + (1 - a) ln (e^{x} + 1) = x + (1 - a) ln (e^{- x} + 1)

f (x) = a x + (1 - a) ln (e^{x} + 1) = x + (1 - a) ln (e^{- x} + 1)

y = ln (x + 0.5) + 0.2 sin x + 0.4 sin 2 x + 0.3 sin 3 x - 0.1 sin 5 x - 0.2 sin 7 x + 0.15 sin 20 x

y = ln (x + 0.5) + 0.2 sin x + 0.4 sin 2 x + 0.3 sin 3 x - 0.1 sin 5 x - 0.2 sin 7 x + 0.15 sin 20 x

f (x, y) = - 20 exp - 0.2 \frac{x ^{2} + y ^{2}}{2} - exp (\frac{cos 2 π x + cos 2 π y}{2}) + 20 + exp (1)

f (x, y) = - 20 exp - 0.2 \frac{x ^{2} + y ^{2}}{2} - exp (\frac{cos 2 π x + cos 2 π y}{2}) + 20 + exp (1)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TheWinterSky/Amplifying-and-attenuating-neurons
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Fuzzy Logic and Control Systems · Currency Recognition and Detection

Full text

Improving Neural Networks by Adopting Amplifying and Attenuating Neurons

Seongmun Jung

Department of Aerospace Engineering

KAIST, Daejeon, Republic of Korea

[email protected]

&Oh Joon Kwon

Department of Aerospace Engineering

KAIST, Daejeon, Republic of Korea

[email protected]

Abstract

In the present study, an amplifying neuron and attenuating neuron, which can be easily implemented into neural networks without any significant additional computational effort, are proposed. The activated output value is squared for the amplifying neuron, while the value becomes its reciprocal for the attenuating one. Theoretically, the order of neural networks increases when the amplifying neuron is placed in the hidden layer. The performance assessments of neural networks were conducted to verify that the amplifying and attenuating neurons enhance the performance of neural networks. From the numerical experiments, it was revealed that the neural networks that contain the amplifying and attenuating neurons yield more accurate results, compared to those without them.

1 Introduction

Artificial neural networks with a sufficient number of neurons are universal approximators [1]; therefore, properly trained networks with a sufficient number of hidden neurons and hidden layers are able to approximate any unknown functions. Over the past few decades, the studies on the weights initialization [2, 3] and activation functions [4, 5, 3] were conducted, and now it is possible to train the neural networks having multiple hidden layers. Hence, neural networks with multiple hidden layers have been successfully adopted to various areas, such as image classification [6], natural language processing [7], and numerical model improvement [8].

Since neural networks are theoretically able to approximate any functions within small error ranges as its size increases [6], large neural networks are preferred for handling complicated tasks. However, large neural networks also suffer from the gradient vanishing/exploding and degradation problem [9]. In addition, as the network size gets larger, the required computational effort increases. Recently, Residual Network [6] and Batch Normalization [10] enable to train large neural networks. Nonetheless, additional computations are required, and the size of neural networks is still limited depending on the computational power available. Therefore, it is crucial to enhance the efficiency and performance of neural network models under the constraint of a specified network size.

From the early neural network models [11, 12, 13], which were developed by imitating biological nervous systems, several studies have been conducted to improve the neural network model. For instance, instead of squashing functions, which are similar to the “all-or-none” activation process [14] of the real neuron, the Rectified Linear Unit (ReLU) function [4] is widely used as an activation function for neural networks. It was also found that neural networks yield improved results when the squared value of the input component is augmented into the input layer [15], and when Batch Normalization method [10] is adopted.

In the present study, an amplifying neuron and attenuating neuron are proposed in order to improve the neural network model. The amplifying neuron amplifies the activated value of the neuron, and the attenuating neuron attenuates the activated value. Those amplifying and attenuating neurons can be readily implemented into the neural network model by applying secondary activation functions. Theoretically, neural networks are possible to perform the multiplication and approximated division of the inputs with the amplifying and attenuating neurons. In addition, the amplifying neuron makes the order of neural networks higher. Therefore, the neural network that contains the amplifying neuron is expected to address the tasks that are more complicated. Such characteristics make it possible to use smaller neural networks for identical tasks, and to obtain smaller error ranges for the neural networks of an identical size.

2 Neural networks

Neural networks are often referred as black boxes, since it is hard to figure out how they work. Especially, the difference between neural networks and biological nervous systems makes it difficult to apply biological intuitions to the neural networks model. For instance, the Heaviside step function tends to be avoided as an activation function, in spite of its similarity to the “all-or-none” law, since its derivative is zero throughout the whole domain. Therefore, the Heaviside function is not suitable for the error back-propagation [16] learning process. Nonetheless, the neural network model is fundamentally an algebraic model that consists of non-linear activations and matrix multiplication operations. Hence, it is possible to conduct an algebraic and numerical analysis of the neural network model.

[TABLE]

In the neural network model, the output vector of each layer is calculated by using the output vector of the previous layer as in Eq. (1). Therefore, the range of the previous layer becomes the domain of the following layer, and this process can be expressed via the function composition. Since the composite function of two linear functions is also a linear function, the output of the neural network consists of linear elements, as shown in figure 1, when the piecewise linear activation functions without any curvature (e.g., the ReLU and LReLU) are used. Furthermore, the derivative of the neural network output is composed of discontinuous constant elements, and it can be considered as the linear combination of the Heaviside step functions with different biases. Such characteristics imply that the neural network with the activation function of the Heaviside step function can be obtained by differentiating the neural network with piecewise linear activation functions.

It is not surprising that the neural network with the Heaviside step activation function is the derivatives of the network with the ReLU, as the Heaviside step function is the derivative of the ReLU. Namely, the neural network model is the integrated model of the biological nervous system. Therefore, the order of the neural network model is higher than the biological one, and less number of neurons are expected to be required for identical tasks. Nonetheless, the neural network model is still needed to be improved. In the case of one-dimensional input and output, neural networks yield similar results to the Euler method, which is generally avoided because of its accuracy.

One approach to the problem is to use the modified piecewise linear activation function having curvature, such as the ELU [5] and softplus [17]. With those activation functions, the output derivative of the neural network becomes continuous; hence, the performance of the neural network increases. However, it is clear that the improvement of the neural network with those activation functions is limited, as they are highly similar to piecewise linear activation functions. Another approach is to implement a high-order polynomial as the non-linear activation function of the network [18, 19]. Although such approach may increase the order of the neural network, the neural network can easily become divergent, especially when the degree of the polynomial activation function is high. In addition, the improvement of the neural network performance is not guaranteed, because there exists no switching feature for the polynomial, and the derivative of the quadratic polynomial is not non-linear, but a linear function.

3 Amplifying and attenuating neurons

3.1 Amplifying neurons

As the output of the previous layer of the neural network becomes the input for the following layer, this process can be expressed via the function composition. Therefore, the order of the neural network can be inferred by the function composition of the network activation function, provided that the activation function is continuous. For instance, the neural network with the sigmoid activation function is a zeroth-order model, as the iterated function of the sigmoid function converges to a constant value function. In a similar manner, the iterated functions of the ReLU, LReLU, PReLU, ELU, and softplus converge to the ReLU; therefore, the neural network using those activation functions works as a first-order model.

[TABLE]

To improve the performance of the neural network, the order of the neural network model needs to be increased. The order of the neural network increases when the high-order activation functions are used. In the present study, an amplifying neuron that provides the high-order activation function is introduced. The output of the amplifying neuron is determined to be the square of the activated value, since the exact integral of the ReLU is the square of the ReLU. In addition, the square of the PReLU with $a$ is the exact integral of the PReLU with $a^{2}$ . Moreover, the integral of the softplus can be approximated by the squared softplus as in Eq. (2). Therefore, the amplifying neuron is an ordinary neuron with second-order activation functions when first-order activation functions are used. Although the order of the sigmoid function does not increase by squaring, it is not an issue, since the softplus can be used to increase the order of the sigmoid function.

[TABLE]

The output of the amplifying neuron can be expressed using the function composition as in Eq. (3). Hence, the amplifying neuron can easily be implemented into the neural network by applying the secondary activation function $G(x)$ with the original activation function $H(x)$ . The coefficient $\nicefrac{{1}}{{2}}$ in Eq. (2) is not considered in Eq. (3), as it can be considered by the weights between neurons.

3.2 Attenuating neurons

Although the amplifying neuron is the neuron that provides high-order activation functions, it can also be considered as the ordinary neuron that amplifies the input. Here, an attenuating neuron, which performs an opposite operation compared to the amplifying neuron, is also introduced. The main purpose of introducing this attenuating neuron is to provide multiplicative inverse value to the neural network, since it is inefficient to approximate such a value with high-order activation functions.

[TABLE]

The attenuating neuron can be readily implemented into the neural network by using the secondary activation function $G(x)$ in a manner similar to the amplifying neuron implementation. However, the reciprocal of $x$ cannot be directly used as the secondary activation function $G(x)$ , since it contains a singularity where $x$ is zero. To avoid this singularity of the reciprocal function, the modified reciprocal function without a singularity shown in Eq. (4) is adopted as the secondary activation function $G(x)$ for the attenuating neuron with $b$ of one.

3.3 Neural networks with amplifying and attenuating neurons

Using the function composition, the amplifying and attenuating neurons can be easily implemented into the neural network, as described in algorithm 1, without any significant additional computational effort. For the amplifying and attenuating neurons, the $H(x)$ value is no longer needed once the secondary activation function $G$ is applied. Therefore, no additional memory is required for the amplifying and attenuating neurons. For the error back-propagation, the activation function derivative of the amplifying and attenuating neurons can be calculated during the forward propagation process by using the chain rule. Once the activation function derivative is properly calculated, the error back-propagation method can be used without any modification.

With the amplifying and attenuating neurons, the performance of the neural network can be improved. For instance, the order of the neural network that contains the amplifying neuron increases, as the high-order activation function is provided. In addition, theoretically, the neural network with the amplifying and attenuating neurons can perform multiplication and approximated division, as depicted in figure 2. It is expected that such characteristics enhance the efficiency of the neural network; i.e., the neural network with the amplifying and attenuating neurons can handle more complicated data without increasing the network size.

However, high-order models are more unstable in general, compared to first-order models. Therefore, replacing the entire neurons of the neural network with the amplifying and attenuating neurons is likely to cause numerical problems, especially for the large neural network. For instance, the order of the neural network only having high-order activation functions increases without limits, as the number of stacked hidden layers increases. Such a numerical instability problem can be resolved by limiting the number of the amplifying and attenuating neurons, provided that the rest of neurons have first-order activation functions.

4 Experiments

To evaluate the performance improvement of the neural network with the amplifying and attenuating neurons, one-dimensional and two-dimensional problems were considered. In the performance assessments, the multilayer perceptron model was adopted as a representative neural network model.

[TABLE]

Here, the parametric softplus function, defined in Eq (5), with $a$ of 0.3 was used as the activation function $F(x)$ for the ordinary neuron, and also as the first activation function $H(x)$ for the amplifying and attenuating neurons. For the bias neuron and the neuron in the output layer, no activation function was applied.

The $L_{2}$ weight regularization method was used to minimize overfitting phenomenon. Also, for the robust training of the neural network, ADAM method [20] was utilized. In the training process, a batch size of one was used, since the neural network trained with small batch methods generally outperforms the one with large batch methods [21]. In the present study, for both numerical experiments, the best evaluation out of ten independent evaluations was dealt for each of network configurations. All the training data sets, trained networks, and source code dealt are available at

https://github.com/TheWinterSky/Amplifying-and-attenuating-neurons

4.1 One-dimensional problem

In the first numerical experiment, neural networks were required to reproduce a one-dimensional function. The one-dimensional function (i.e., the exact solution) consists of multiple transcendental functions, as shown in Eq. (6), so that it cannot be exactly expressed by the finite Taylor series. Such a characteristic is necessary for evaluating the performance improvement of the high-order neural network model, since the output of the neural network with the amplifying and attenuating neurons may possess a form of polynomials. As shown in figure 3, the exact solution shows a clear tendency over the interval $[0,2\pi]$ , and exhibits an alternating sign of its derivative. Therefore, the neural networks were required to reproduce not only the overall tendency, but also the high frequency feature of the exact solution. To construct the training data set, 194 discrete pairs of $x$ and their corresponding exact solution values were used. The network configurations and evaluation results (i.e., the mean absolute error (MAE) and standard deviation (SD) of error) are summarized in table 1.

[TABLE]

To evaluate the effect of the amplifying and attenuating neurons, the MAE and SD of error were compared for the neural networks of an identical size, i.e., Network 1~6. From Network 2 to 6, the amplifying and attenuating neurons are placed to the entire hidden layers. As shown in figure 4, Network 1, which does not have any amplifying and attenuating neurons, successfully predicts the overall tendency of the exact solution, although the detailed behavior, such as the high frequency feature, is not reproduced properly because of the insufficient network size. It was found that the performance of the neural networks increases when the amplifying and attenuating neurons are placed in the networks. In addition, the neural networks with the amplifying neurons produce better results, compared to those with the attenuating neurons. Nonetheless, Network 5 that contains both the amplifying and attenuating neurons, not Network 6, produces the most accurate results among the networks of an identical size, and Network 5 successfully reproduces the high frequency feature of the exact solution as shown in the figure 4. It is estimated that Network 5 outperforms Network 4 because reciprocal features are provided by the attenuating neuron for Network 5.

It is usually known that the main concern of large networks is the occurrence of overfitting phenomenon. To examine whether the neural network with the amplifying and attenuating neurons suffers from overfitting, the neural networks of a sufficiently large size were also considered. As observed from the MAE of Network 7, the neural network with nine hidden layers is enough to handle the exact solution. In order to avoid the order of Network 8 being excessively high, the amplifying and attenuating neurons were placed from the first hidden layer to the fifth. Compared to Network 7, no significant degradation of the output was observed for Network 8 in the continuous domain; therefore, overfitting phenomenon did not occur.

4.2 Two-dimensional problem

As the second numerical experiment, the two-dimensional Ackley function [22] was chosen to be the exact solution. The Ackley function consists of exponential and sinusoidal functions as in Eq. (7). It was required that neural networks reproduce the exact solution based on the training data set that consists of 300 randomly chosen discrete points illustrated in figure 5. The MAE and SD of the error from the evaluations and the network configurations are shown in table 2. To limit the order of Network 10, the amplifying and attenuating neurons are placed from the second hidden layer to the fifth for Network 10.

[TABLE]

From the comparison of the MAE and SD of error between Network 9 and 10, it was observed that the absolute error drastically reduces when the amplifying and attenuating neurons are placed in the neural network as shown in figure 6. Both the MAE and SD of error decrease over the entire continuous domain area, especially in the region included in the training data set. However, it was found that the maximum value of the absolute error was developed from outside of the training data set. Such a behavior is considered to originate from the nature of the high-order model. Generally, high-order models are not appropriate for extrapolation, although they outperform the first-order models when interpolation is performed instead.

Such a numerical behavior in the region outside of the training data set can be alleviated when the input of the amplifying neuron is normalized. Hence, Batch Normalization method is expected to relieve the problem, since all the input distributions of the hidden neurons are rescaled and shifted. Nonetheless, Batch Normalization also has a possibility of limiting the performance improvement of the neural network that contains the amplifying and attenuating neurons, since Batch Normalization can suppress the high-order characteristics itself.

5 Conclusion

In the present study, an amplifying neuron and attenuating neuron are proposed for improving the performance of the neural network. Amplifying neurons square the activated output value of the neuron, and thus they can be considered as ordinary neurons with a second-order activation function. Meanwhile, attenuating neurons yield an approximated reciprocal activated output value of neurons. In theory, the order of the neural network increases when the amplifying neuron is placed. Therefore, the neural network model with the amplifying neuron becomes a high-order model. Furthermore, additional features can also be provided when the attenuating neuron is placed in the hidden layers.

From the numerical experiments, it was revealed that the neural networks that contain the amplifying and attenuating neurons outperform those without them. In addition, the neural networks having both the amplifying and attenuating neurons yield the most accurate results. Despite of the performance improvement, it was found that the output value of the neural networks containing the amplifying and attenuating neurons can be immoderate for the extrapolation region, outside of the training data set. It is expected that such a behavior can be alleviated when Batch Normalization method is adopted, although it may limit the performance improvement.

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Hornik et al. [1989] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks , 2(5):359–366, 1989.
2Glorot and Bengio [2010] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics , pages 249–256, 2010.
3He et al. [2015] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision , pages 1026–1034, 2015.
4Nair and Hinton [2010] Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) , pages 807–814, 2010.
5Clevert et al. [2015] Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). In International Conference for Learning Representations , 2015.
6He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 770–778, 2016.
7Socher et al. [2013] Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing , pages 1631–1642, 2013.
8Singh et al. [2017] Anand Pratap Singh, Shivaji Medida, and Karthik Duraisamy. Machine-learning-augmented predictive modeling of turbulent separated flows over airfoils. AIAA Journal , pages 2215–2227, 2017.